Contributed by Michael Cunanan and Rohit Patel
Symptoms
Errors
Explanations
Environments
Solutions
Related content
Symptoms
Full-system outage with a high volume of PEGA0037 Alerts for certain rules.
The Purge Agent did not finish processing and threw the error message skipping purge because HasPurgeHappenedSinceStartup == false. This error occurs if the server is restarted within two (2) weeks. Your schedule of server restarts might not conform to this schedule.
Development environments CPU usage spiked to 100 percent. This major business impact prevented users from working.
The cache tables and tmp files became bloated and continued to grow, causing system slowness. Purge was not running and, when it was running, it was not able to delete enough data. SQL Queries were failing due to socket timeouts.
Purge activity was failing to clear rule cache tables. In addition, the system had many used rules and unused application branches that cause the table pr4_rules_vw to keep growing.
Purge was scheduled to run weekly, and restart was also happening weekly. Purge schedule and the weekly server reboot conflicted, causing Purge runs to be skipped.
Purge was not running daily.
Database operations were very slow.
Errors
PEGA0037 Alerts
Log message:
skipping purge because HasPurgeHappenedSinceStartup == false
Explanations
The Purge Agent did not finish processing and threw the error message skipping purge because HasPurgeHappenedSinceStartup == false because the local site policy was set to restarting nodes every week. This local site policy did not align with the system default purge policy. See Solution 1.
The Pega-provided default Purge schedule frequency for non-production environments is inadequate. See Solution 2.
Conflicting Purge schedule with weekly reboot causes runs to be skipped. See Solution 1.
Pega-provided default settings to purge records are insufficient to keep up with the inflow of non-production environments. In some cases, we have observed that Purge is running but not clearing enough data. The following default settings are adequate for production environments, where there is no high-volume development activity and no server restarts. But they are inadequate for non-production environments.
DSS fua/maximumAssembliesToPurge: 5000
DSS fua/maximumApplicationsToPurge: 10000
DSS maxResultToDeleteFromRuleSetIndexTable: 10000
DSS purgeAgeInDaysForRuleSetInRuleSetIndex: 90 days (about 3 months)
DSS purgeAgeInDaysForRulesInLogUsage: 7 days
DSS purgeAgeInHoursForRulesNotInLogUsage: 25 hours
See Solution 3.
Purge runs fail because of database performance issues: Purge will start failing because SQL queries are failing. When Purge does not run for some time and does not delete enough entries from cache tables, these tables grow. If the cache tables are too big, database operations slow down. Slow database operations impede performance of the entire system. SQL queries will start failing with Socket Timeout exceptions.
Failed purges and excessive unused application branches are not effectively detected. This condition leads to reactive remediation. See Solution 4 for enhancements in future releases that will support proactive detection of failed purges and excessive branches.
Environments
The problem was reported in the following environments:
- Pega Platform™ version 7.3.1
- Pega Platform version 8.6.2 in Pega Cloud® services 2.22.4
- Pega Platform version 8.6.2 in Pega Cloud services 2.23.2
- Pega Platform version 8.6.2 in Pega Cloud services 2.24.4
- Pega Platform version 8.6.3 in Pega Cloud services 2.23.4
- Pega Platform version 8.6.3 in Pega Cloud services 2.23.2
- Pega Platform version 8.6.4
- Pega Platform version 8.6.5
- Pega Platform version 8.6.5 in Pega Cloud services 2.23.4
- Pega Platform version 8.6.5 in Pega Cloud services 2.24.4
- Pega Platform version 8.6.6
- Pega Platform version 8.6.6 in Pega Cloud services 2.24.10
- Pega Platform version 8.7.1
- Pega Platform version 8.7.1 in Pega Cloud services 2.23.4
- Pega Platform version 8.7.3 in Pega Cloud services 2.23.1
- Pega Platform version 8.8.2 in Pega Cloud services 2.24.10
Solutions
Choose the solution that works for the corresponding root cause, Explanation, of the symptoms that your Pega deployment experiences.
Solution 1 Conflicting Purge schedule
Go to My Support Portal and request HFIX-83910. See Creating a support ticket.
This hotfix adds the Pega-RulesEngine ruleset DSS fua/skipHasPurgeHappenedSinceStartupCheck. When set to true, the check for 'Has Purge Happened Since Startup' is skipped.
You can also set the DSS:
Owning Ruleset: Pega-RulesEngine
Purpose: fua/skipHasPurgeHappenedSinceStartupCheck
Value: true
Default value: false
In a future release, this DSS will be the default. Watch the Pega Support Center Pega Platform Resolved Issues.
Solution 2 Purge schedule inadequate for non-production environments
The default schedule of the PurgeDatabaseAssemblyCache agent is to run on Sunday at 1:00 AM local time.
Change the default schedule of the PurgeDatabaseAssemblyCache agent to run daily.
See Managing weekly reboots and the default schedule for the PurgeAssemblyDatabaseCache agent [SDR-A2], Solutions, Alternative solution.
In a future release, this DSS will be the default. Watch the Pega Support Center Pega Platform Resolved Issues and search for any one of the following ISSUES for the Pega Platform version:
For Pega Platform version 8.7.4, search for ISSUE-738175.
For Pega Platform 8.6.6, search for ISSUE-738176.
For Pega Platform 8.8, search for ISSUE-738177.
Solution 3 Default DSSes inadequate for non-production environments
Adjust your settings using some of the DSSes documented in Managing weekly reboots and the default schedule for the PurgeAssemblyDatabaseCache agent [SDR-A2].
In a future release, the DSSes will be default settings. Watch the Pega Support Center Pega Platform Resolved Issues.
Solution 4 Weak detection of failed purges and excessive unused application branches
See the solutions in the two Support Documents:
Managing inactive branches in branched application development [SDR-A1]
Managing weekly reboots and the default schedule for the PurgeAssemblyDatabaseCache agent [SDR-A2]
A future release of the Pega Platform will provide PDC alerts for failed purges of cache tables and for excessive branches in the system. [cases] Watch the Pega Support Center Pega Platform Resolved Issues.
Related content
Pega Documentation
Preassembling rules in an application
PEGA0038 alert: The wait time for rule cache access exceeds a threshold
Pega Support Documents
Do not modify or truncate critical System tables
Managing inactive branches in branched application development [SDR-A1]
Managing weekly reboots and the default schedule for the PurgeAssemblyDatabaseCache agent [SDR-A2]