Question
ING
NL
Last activity: 7 Mar 2024 6:33 EST
Quiescing and graceful shutdown in Kubernetes environments
Good day.
Currently, we are attempting to implement zero-downtime rolling restart of Pega deployments in Kubernetes.
Pega is deployed via up-to-date Helm charts, into its own namespace.
What we expect to see during rollout restart is pega-web pods terminated in a graceful way and all the clients connected to those pods via Kubernetes Ingress/Service gracefully moved to new pods.
Unfortunately, we can not quite achieve this just yet.
What happens is, when the pega-web pod is scheduled for termination, it is immediately removed from Endpoint resource, and thus is becoming unavailable to clients / ingresses.
No network connectivity back to Ingress/end users is allowed.
This is as expected, per design of Kubernetes itself.
This unfortunately means downtime and loss of sessions for our end users, guaranteed.
We have tried both quiescing with immediate drain and slow drain, but the results do not differ much.
We have tried adding preStop lifecycle hook to allow for connection draining to happen before quiescing happens.
But alas, users connected to the pod, that's being terminated, are loosing their sessions and state.
Good day.
Currently, we are attempting to implement zero-downtime rolling restart of Pega deployments in Kubernetes.
Pega is deployed via up-to-date Helm charts, into its own namespace.
What we expect to see during rollout restart is pega-web pods terminated in a graceful way and all the clients connected to those pods via Kubernetes Ingress/Service gracefully moved to new pods.
Unfortunately, we can not quite achieve this just yet.
What happens is, when the pega-web pod is scheduled for termination, it is immediately removed from Endpoint resource, and thus is becoming unavailable to clients / ingresses.
No network connectivity back to Ingress/end users is allowed.
This is as expected, per design of Kubernetes itself.
This unfortunately means downtime and loss of sessions for our end users, guaranteed.
We have tried both quiescing with immediate drain and slow drain, but the results do not differ much.
We have tried adding preStop lifecycle hook to allow for connection draining to happen before quiescing happens.
But alas, users connected to the pod, that's being terminated, are loosing their sessions and state.
Given your expertise with Pega Cloud (which also runs on Kubernetes) we would like to request an advice on configuration we can use to achieve zero-downtime restarts.
This can be Kubernetes configuration (like use of EndpointSlices vs Endpoints) or Pega configuration, we are open to any and all suggestions.
@EugeneK17074130
To achieve zero-downtime restarts in Kubernetes, you should use the kubectl rollout restart command. This command gradually deletes and replaces all pods in a set so that there are always active pods running and users do not experience downtime. For example, to restart the pega-web tier, you would use the command kubectl rollout restart deployment/pega-web --namespace mypega. You can restart Pega Platform nodes in any order, either sequentially or in parallel. This method is recommended to avoid downtime when restarting Pega Platform nodes. Please note that this method assumes that you have a sufficient number of pods running to handle the load during the restart process.
⚠ This is a GenAI-powered tool. All generated answers require validation against the provided references.
🌕 Restarting nodes in containerized deployments
🌕 Environment restarts > Considerations for any Pega Cloud restarts
🌕 Installing, patching, or updating Pega software in Kubernetes deployments > Updating the Pega software in your deployment
@EugeneK17074130
To achieve zero-downtime restarts in Kubernetes, you should use the kubectl rollout restart command. This command gradually deletes and replaces all pods in a set so that there are always active pods running and users do not experience downtime. For example, to restart the pega-web tier, you would use the command kubectl rollout restart deployment/pega-web --namespace mypega. You can restart Pega Platform nodes in any order, either sequentially or in parallel. This method is recommended to avoid downtime when restarting Pega Platform nodes. Please note that this method assumes that you have a sufficient number of pods running to handle the load during the restart process.
⚠ This is a GenAI-powered tool. All generated answers require validation against the provided references.
🌕 Restarting nodes in containerized deployments
🌕 Environment restarts > Considerations for any Pega Cloud restarts
🌕 Installing, patching, or updating Pega software in Kubernetes deployments > Updating the Pega software in your deployment
If you need further help I would suggest that you log a support issue with our cloud team to discuss your needs.