Question
Accenture
JP
Last activity: 21 Jun 2024 4:43 EDT
Deployment on AWS EKS with a focus on high availability
Hello,
I am in the process of setting up a Pega deployment on AWS EKS with a focus on high availability. My goal is to ensure that pods are evenly distributed across multiple AWS availability zones and spread across different hosts within those zones.
While I have found that topologySpreadConstraints in Kubernetes can be used to distribute pods at the zone level, this feature seems to be missing from the provided Pega Helm chart, particularly for clustering service and Search and Reporting Service (SRS) pods.
Is there a recommended approach or best practice within the Pega community for implementing both availability zone and host-level distribution for these services? Specifically, I'm looking for a way to include topologySpreadConstraints or an equivalent functionality in the Helm chart configuration for clustering service and SRS pods.
I would greatly appreciate any advice or insights on how to configure our environment to meet these high availability requirements.
Thank you for your time and assistance!
-
Reply
-
Yuju Tanaka -
Share this page Facebook Twitter LinkedIn Email Copying... Copied!
Eclatprime Digital Private Limited
SG
Hi yuju,
In pega yaml files there will be a area about hpa where you can dedicatedly define your nodes and can decide where exactly the pods has to be created inside a cluster
using that approach you can predefine few nodes for web node only and few nodes for batch and few for stream. This helps pega like when ever a pod goes down it will automatically create another pod in same nodes instead of checking for other available nodes in the cluster.
Updated: 20 Jun 2024 21:40 EDT
Accenture
JP
Thank you for your response.
Based on our review, we have not found a method to control node placement using HPA (Horizontal Pod Autoscaler). It is likely that other features such as nodeSelector, affinity/anti-affinity rules would need to be combined for this purpose. If you could provide a specific example of how to configure node placement using HPA, we would greatly appreciate it.
Eclatprime Digital Private Limited
SG
please go thru the below link values .yaml file and go to section hpa
there you can see all the topology and related things
https://github.com/pegasystems/pega-helm-charts/blob/master/charts/pega/values.yaml
Phalanx Consultancy Services
GB
Hey, you are right, there does not seem to be an option to currently include the topologyspreadconstraints within the helm charts.
Possibly wait for Pega to update the charts :)
or
One way to probably go around this, is to deploy the srs using the current helm chart, retrieve the deployment manifest, make the changes with respect to the topology spread and then redeploy the updated manifest. This could be an easy(dirty) way to achieve HA and test but most definitely not the right way as the configuration drifts from the original helm version.
The other recommended approach is to make the changes via helm so that the configuration revision integrity is maintained. This is a bit tricky and you would have to make changes to the helm template files in the chart and ultimately the values.yaml.
- we need to download the helm repo to your local working env and then navigate to the "srs/templates/srsservice_deployment.yaml".
topologySpreadConstraints:
{{- range .Values.srsRuntime.topologySpreadConstraints }}
- maxSkew: {{ .maxSkew }}
topologyKey: {{ .topologyKey }}
whenUnsatisfiable: {{ .whenUnsatisfiable }}
labelSelector:
matchLabels:
app: srs
{{- end }}
you can also use indent technique to update the srsservice_deployment.yaml.
Hey, you are right, there does not seem to be an option to currently include the topologyspreadconstraints within the helm charts.
Possibly wait for Pega to update the charts :)
or
One way to probably go around this, is to deploy the srs using the current helm chart, retrieve the deployment manifest, make the changes with respect to the topology spread and then redeploy the updated manifest. This could be an easy(dirty) way to achieve HA and test but most definitely not the right way as the configuration drifts from the original helm version.
The other recommended approach is to make the changes via helm so that the configuration revision integrity is maintained. This is a bit tricky and you would have to make changes to the helm template files in the chart and ultimately the values.yaml.
- we need to download the helm repo to your local working env and then navigate to the "srs/templates/srsservice_deployment.yaml".
topologySpreadConstraints:
{{- range .Values.srsRuntime.topologySpreadConstraints }}
- maxSkew: {{ .maxSkew }}
topologyKey: {{ .topologyKey }}
whenUnsatisfiable: {{ .whenUnsatisfiable }}
labelSelector:
matchLabels:
app: srs
{{- end }}
you can also use indent technique to update the srsservice_deployment.yaml.
- update this yaml file with details of the topology spread (example)
srsRuntime:
replicaCount: 2
srsImage: <image>
imagePullPolicy: "IfNotPresent"
# topologySpreadConstraints: ### like how it is mentioned in the pega/values.yaml
# - maxSkew: <integer>
# topologyKey: <string>
# whenUnsatisfiable: <string>
# labelSelector: <object>
Once all the changes are done, use the local updated helm charts to deploy. I have not tested these but hopefully this is work. Please test this and let us know if it worked.
Updated: 20 Jun 2024 21:41 EDT
Accenture
JP
Thanks for your response.
The method you have presented certainly seems to work well. However, we are concerned that if we customize the helm chart provided by Pega, that environment will no longer be supported.
Phalanx Consultancy Services
GB
@YujuT16716329 , Sure I understand.. But currently, there is seems to be no option on the helm deployment template to use node selectors, anti affinity, etc..
So may be you can do one thing without changing anything on the helm charts.
- First deploy as is using Pega's helm chart for SRS ..
- Create seperate manifests for pdb, hpa and then for the deployment.
- Apply the pdb and hpa manifest and then patch the deployment manifest to your existing deployment
Example:
@YujuT16716329 , Sure I understand.. But currently, there is seems to be no option on the helm deployment template to use node selectors, anti affinity, etc..
So may be you can do one thing without changing anything on the helm charts.
- First deploy as is using Pega's helm chart for SRS ..
- Create seperate manifests for pdb, hpa and then for the deployment.
- Apply the pdb and hpa manifest and then patch the deployment manifest to your existing deployment
Example:
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: srs-pdb
namespace: your-srs-namespace
spec:
minAvailable: 1
selector:
matchLabels:
app.kubernetes.io/instance: backingservices
app.kubernetes.io/name: srs-service
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: srs-hpa
namespace: your-srs-namespace
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: your-srs-deployment-name
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
apiVersion: apps/v1
kind: Deployment
metadata:
name: your-srs-deployment-name
namespace: your-srs-namespace
spec:
template:
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app.kubernetes.io/name
operator: In
values:
- srs-service
topologyKey: "kubernetes.io/hostname"
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: eks.amazonaws.com/nodegroup
operator: In
values:
- default
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
app.kubernetes.io/name: srs-service
change the topologykey, values for nodeselectorterms, accordingly.