Learn what can cause delays in ServiceLevelEvents agent processing.
Learn the correct approaches to setting Service-Level Agreements for Stages, Processes and Steps to avoid delays in your specific scenario.
Issue
You notice delays in ServiceLevelEvents agent processing.
Symptoms and Impact
-
Service Level Agreements (SLAs) for cases, stages, processes, assignments, and/or wait steps are not processed at the expected time.
-
Increases in backlog of System-Queue-ServiceLevel queue items.
Steps to Reproduce
-
Configure an SLA on a case type, stage, flow, or assignment, or configure a wait step on a case type.
-
Process a case until it reaches the defined scope of the SLA.
-
Allow Case processing time to elapse.
Explanation
Delays in SLA processing can be due to a variety of causes. The most frequently reported causes are:
-
Calls to slow or unavailable external services, such as via Connect-* rules, invoked from custom code which executes during SLA processing.
-
Significant processing in custom code that runs during SLA processing.
-
Custom Retry mechanisms with minimal retry delays.
-
Very short wait times.
-
Insufficient instances of the ServiceLevelEvents agent.
-
Custom processes queuing more items than the currently available instances of the agent can process. This can be due to the agent not running on multiple nodes, too many cases being queued in a short period, or a combination of those scenarios.
-
Agent threadpool exhaustion, due to other agents using all the available threads.
Solution
Leverage the below approaches to help isolate the cause in your specific scenario:
-
Confirm that the ServiceLevelEvents agent is enabled and is running on enough nodes to process the expected number of queued items.
-
Checking the ServiceLevelEvents agent’s last runtime and next runtime, to confirm if an agent run may be stuck.
-
Review the volume of System-Queue-ServiceLevel queue items. If the volume is greater than expected, review a sampling of items to confirm if there are potentially extraneous items being queued for SLA by one or more processes. The pyMinimumDateTime for processing value for scheduled queue items can help to confirm if the agent is running behind.
-
Trace the ServiceLevelEvents agent to look for any processes that take a long time to run while the agent is executing. For any found, consider taking actions to reduce processing time, or potentially moving the identified processes outside of the context of SLA processing.
-
Review PDC and/or the PegaRULES-ALERT logs to look for events/alerts associated to the ServiceLevelEvents agent, or the ProcessEvent activity which the agent runs (Pega0020s, Pega0005s, etc). For any found, consider taking actions to address the cause of the event/alert, or potentially moving the process, which throws the event/alert, outside the context of SLA processing.
-
Review the number of agents that could potentially run at the same time as the SLA, and adjust the agent/threadpoolsize DSS accordingly.
-
For Pega Release ‘23 and later, consider switching to Queue processing for SLAs, which can help with scalability, reliability, and processing improvements. More information can be found in documentation Queue processing for SLAs.
Environment
Versions found in
This behavior can impact any version of the Pega Platform.
References
Setting Service-Level Agreements for Case resolution
Configuring delayed service-level processing
Pausing and resuming Processes in Cases
Application debugging by using the Tracer tool
Using job schedulers and queue processors instead of agents