Troubleshooting PurgeAssemblyDatabaseCache agent failure [SDR-A101]

Support Doc

MaryCarbonara

Member since 2010

216 posts

Posted: Oct 27, 2023

Last activity: Nov 28, 2023

Posted: 27 Oct 2023 14:39 EDT
Last activity: 28 Nov 2023 22:09 EST

Troubleshooting PurgeAssemblyDatabaseCache agent failure [SDR-A101]

Contributed by Michael Cunanan and Rohit Patel

Symptoms
Errors
Explanations
Environments
Solutions
Related content

Symptoms

Full-system outage with a high volume of PEGA0037 Alerts for certain rules.

The Purge Agent did not finish processing and threw the error message skipping purge because HasPurgeHappenedSinceStartup == false. This error occurs if the server is restarted within two (2) weeks. Your schedule of server restarts might not conform to this schedule.

Development environments CPU usage spiked to 100 percent. This major business impact prevented users from working.

The cache tables and tmp files became bloated and continued to grow, causing system slowness. Purge was not running and, when it was running, it was not able to delete enough data. SQL Queries were failing due to socket timeouts.

Purge activity was failing to clear rule cache tables. In addition, the system had many used rules and unused application branches that cause the table pr4_rules_vw to keep growing.

Purge was scheduled to run weekly, and restart was also happening weekly. Purge schedule and the weekly server reboot conflicted, causing Purge runs to be skipped.

Purge was not running daily.

Database operations were very slow.

Errors

PEGA0037 Alerts

Log message:
skipping purge because HasPurgeHappenedSinceStartup == false

Explanations

The Purge Agent did not finish processing and threw the error message skipping purge because HasPurgeHappenedSinceStartup == false because the local site policy was set to restarting nodes every week. This local site policy did not align with the system default purge policy. See Solution 1.

The Pega-provided default Purge schedule frequency for non-production environments is inadequate. See Solution 2.

Conflicting Purge schedule with weekly reboot causes runs to be skipped. See Solution 1.

Pega-provided default settings to purge records are insufficient to keep up with the inflow of non-production environments. In some cases, we have observed that Purge is running but not clearing enough data. The following default settings are adequate for production environments, where there is no high-volume development activity and no server restarts. But they are inadequate for non-production environments.

DSS fua/maximumAssembliesToPurge: 5000

DSS fua/maximumApplicationsToPurge: 10000

DSS maxResultToDeleteFromRuleSetIndexTable: 10000

DSS purgeAgeInDaysForRuleSetInRuleSetIndex: 90 days (about 3 months)

DSS purgeAgeInDaysForRulesInLogUsage: 7 days

DSS purgeAgeInHoursForRulesNotInLogUsage: 25 hours

See Solution 3.

Purge runs fail because of database performance issues: Purge will start failing because SQL queries are failing. When Purge does not run for some time and does not delete enough entries from cache tables, these tables grow. If the cache tables are too big, database operations slow down. Slow database operations impede performance of the entire system. SQL queries will start failing with Socket Timeout exceptions.

Failed purges and excessive unused application branches are not effectively detected. This condition leads to reactive remediation. See Solution 4 for enhancements in future releases that will support proactive detection of failed purges and excessive branches.

Environments

The problem was reported in the following environments:

Pega Platform™ version 7.3.1
Pega Platform version 8.6.2 in Pega Cloud® services 2.22.4
Pega Platform version 8.6.2 in Pega Cloud services 2.23.2
Pega Platform version 8.6.2 in Pega Cloud services 2.24.4
Pega Platform version 8.6.3 in Pega Cloud services 2.23.4
Pega Platform version 8.6.3 in Pega Cloud services 2.23.2
Pega Platform version 8.6.4
Pega Platform version 8.6.5
Pega Platform version 8.6.5 in Pega Cloud services 2.23.4
Pega Platform version 8.6.5 in Pega Cloud services 2.24.4
Pega Platform version 8.6.6
Pega Platform version 8.6.6 in Pega Cloud services 2.24.10
Pega Platform version 8.7.1
Pega Platform version 8.7.1 in Pega Cloud services 2.23.4
Pega Platform version 8.7.3 in Pega Cloud services 2.23.1
Pega Platform version 8.8.2 in Pega Cloud services 2.24.10

Solutions

Choose the solution that works for the corresponding root cause, Explanation, of the symptoms that your Pega deployment experiences.

Solution 1 Conflicting Purge schedule

Go to My Support Portal and request HFIX-83910. See Creating a support ticket.

This hotfix adds the Pega-RulesEngine ruleset DSS fua/skipHasPurgeHappenedSinceStartupCheck. When set to true, the check for 'Has Purge Happened Since Startup' is skipped.

You can also set the DSS:

Owning Ruleset: Pega-RulesEngine

Purpose: fua/skipHasPurgeHappenedSinceStartupCheck

Value: true

Default value: false

In a future release, this DSS will be the default. Watch the Pega Support Center Pega Platform Resolved Issues.

Solution 2 Purge schedule inadequate for non-production environments

The default schedule of the PurgeDatabaseAssemblyCache agent is to run on Sunday at 1:00 AM local time.

Change the default schedule of the PurgeDatabaseAssemblyCache agent to run daily.

See Managing weekly reboots and the default schedule for the PurgeAssemblyDatabaseCache agent [SDR-A2], Solutions, Alternative solution.

In a future release, this DSS will be the default. Watch the Pega Support Center Pega Platform Resolved Issues and search for any one of the following ISSUES for the Pega Platform version:

For Pega Platform version 8.7.4, search for ISSUE-738175.

For Pega Platform 8.6.6, search for ISSUE-738176.

For Pega Platform 8.8, search for ISSUE-738177.

Solution 3 Default DSSes inadequate for non-production environments

Adjust your settings using some of the DSSes documented in Managing weekly reboots and the default schedule for the PurgeAssemblyDatabaseCache agent [SDR-A2].

In a future release, the DSSes will be default settings. Watch the Pega Support Center Pega Platform Resolved Issues.

Solution 4 Weak detection of failed purges and excessive unused application branches

See the solutions in the two Support Documents:

Managing inactive branches in branched application development [SDR-A1]

Managing weekly reboots and the default schedule for the PurgeAssemblyDatabaseCache agent [SDR-A2]

A future release of the Pega Platform will provide PDC alerts for failed purges of cache tables and for excessive branches in the system. [cases] Watch the Pega Support Center Pega Platform Resolved Issues.

Support Doc

Troubleshooting PurgeAssemblyDatabaseCache agent failure [SDR-A101]

Symptoms

Errors

Explanations

Environments

Solutions

Solution 1 Conflicting Purge schedule

Solution 2 Purge schedule inadequate for non-production environments

Solution 3 Default DSSes inadequate for non-production environments

Solution 4 Weak detection of failed purges and excessive unused application branches

Related content

Pega Documentation

Pega Support Documents

Pega Support Center Questions and Answers

Need help or want to help others?

Experience the benefits of Support Center when you log in.

Support Doc

Troubleshooting PurgeAssemblyDatabaseCache agent failure [SDR-A101]

Symptoms

Errors

Explanations

Environments

Solutions

Solution 1 Conflicting Purge schedule

Solution 2 Purge schedule inadequate for non-production environments

Solution 3 Default DSSes inadequate for non-production environments

Solution 4 Weak detection of failed purges and excessive unused application branches

Related content

Pega Documentation

Pega Support Documents

Pega Support Center Questions and Answers

Related content:

Need help or want to help others?

Experience the benefits of Support Center when you log in.

We'd prefer it if you saw us at our best.