Learn what can cause delays in ServiceLevelEvents agent processing.
Learn the correct approaches to setting Service-Level Agreements for Stages, Processes and Steps to avoid delays in your specific scenario.
Issue
You notice delays in ServiceLevelEvents agent processing.
Symptoms and Impact
-
Service Level Agreements (SLAs) for cases, stages, processes, assignments, and/or wait steps are not processed at the expected time.
-
Increases in backlog of System-Queue-ServiceLevel queue items.
Steps to Reproduce
-
Configure an SLA on a case type, stage, flow, or assignment, or configure a wait step on a case type.
-
Process a case until it reaches the defined scope of the SLA.
-
Allow Case processing time to elapse.
Explanation
Delays in SLA processing can be due to a variety of causes. The most frequently reported causes are:
-
Calls to slow or unavailable external services, such as via Connect-* rules, invoked from custom code which executes during SLA processing.
-
Significant processing in custom code that runs during SLA processing.
-
Custom Retry mechanisms with minimal retry delays.
-
Very short wait times.
-
Insufficient instances of the ServiceLevelEvents agent.
-
Custom processes queuing more items than the currently available instances of the agent can process. This can be due to the agent not running on multiple nodes, too many cases being queued in a short period, or a combination of those scenarios.
-
Agent threadpool exhaustion, due to other agents using all the available threads.
Solution
Leverage the below approaches to help isolate the cause in your specific scenario:
-
Confirm that the ServiceLevelEvents agent is enabled and is running on enough nodes to process the expected number of queued items.
-
Checking the ServiceLevelEvents agent’s last runtime and next runtime, to confirm if an agent run may be stuck.
-
Review the volume of System-Queue-ServiceLevel queue items. If the volume is greater than expected, review a sampling of items to confirm if there are potentially extraneous items being queued for SLA by one or more processes. The pyMinimumDateTime for processing value for scheduled queue items can help to confirm if the agent is running behind.
-
Trace the ServiceLevelEvents agent to look for any processes that take a long time to run while the agent is executing. For any found, consider taking actions to reduce processing time, or potentially moving the identified processes outside of the context of SLA processing.
-
Review PDC and/or the PegaRULES-ALERT logs to look for events/alerts associated to the ServiceLevelEvents agent, or the ProcessEvent activity which the agent runs (Pega0020s, Pega0005s, etc). For any found, consider taking actions to address the cause of the event/alert, or potentially moving the process, which throws the event/alert, outside the context of SLA processing. Please note that this approach applies when using queue processing for SLAs, with the addition of the Pega0117 alert.
-
Review the number of agents that could potentially run at the same time as the SLA, and adjust the agent/threadpoolsize DSS accordingly.
-
For Pega Release ‘23 and later, consider switching to Queue processing for SLAs, which can help with scalability, reliability, and processing improvements. More information can be found in documentation Queue processing for SLAs. Please note that using queue processing for SLAs is inherently more performant than the using the SLA Agent, however certain configuration issues, such as slow connector calls (Pega0020s) or slow reports (Pega0005s) can still delay SLA processing even when the Queue Processor is being used. In such scenarios, reviewing and addressing those configurations will still be required.
Environment
Versions found in
This behavior can impact any version of the Pega Platform.
References
Setting Service-Level Agreements for Case resolution
Configuring delayed service-level processing
Pausing and resuming Processes in Cases
Application debugging by using the Tracer tool
Using job schedulers and queue processors instead of agents