An issue has been identified with the potential consequences of unexpectedly triggering the backfill process while working with Interaction History (IH) Summaries and the aggregated data contained in them. Running the backfill process will affect the accuracy of the aggregations used in engagement policies during the backfill process.
Issue: Deploying changes in IH-based Summary Data Sets will stop aggregation and start replaying IH records from files in the repository to recalculate aggregates. The duration of this process depends on the number of IH records in S3 and may take a few days to replay, depending on the volume of records stored in IH. You need to be aware that during this process all IH Summary Data Sets will not have current values of aggregates and engagement policies may not work as expected.
What is this Support document about?
If you are unfamiliar with how Customer Decision Hub (CDH) utilizes aggregates, let us provide a brief explanation. The Next Best Action (NBA) Designer Strategy framework, which ships with CDH out of the box, relies on a concept called Summaries to determine whether a given customer can receive a piece of content.
For example, if a customer was shown a Credit Card offer twice in the past month, an IH Summary that counts the number of times the offer was shown to the customer in the past 30 days, is kept up to date by CDH. If, for example, you wished to change your marketing strategy to count the number of times this offer was shown in the last 35 days instead of the past 30 days, you would modify the Summary (which is an advanced configuration of CDH that is not recommended – but it is possible). As soon as you save this modification, CDH will realize that it needs to recalculate the summary by re-reading IH from the past 35 days to re-initialize the Summary’s current value.
This advisory is informing you that if you make such a change (or press button causing a Summary to be recalculated for any reason listed below), the Summary could take a day or longer to recalculate. During that time, the Summary’s value will be indeterminate, and this can lead to incorrect decision making. We are taking the proactive initiative to notify our clients about this to prevent decision making surprises.
Which actions will cause recalculation of all aggregates?
Changing existing materialized IH Summary Data Set. This will apply both to editing IH Summary Data Sets in Dev Studio and IH Summaries in CDH (using Customer Profile Designer – this generates IH Summary Data Sets). These changes include:
- Adding or removing aggregates from existing IH Summary Data Sets
- Updating aggregates in Summary Data Sets. This includes:
- Update of the aggregate filter
- Changing window size for aggregate
- Changing aggregate type
- Changing output or input property of aggregate
- Changing the list of additional keys
- Changing global filter on Summary Data Set
- Creating new IH Summary in CDH of IH Summary Data Set in Designer Studio
- Clicking “Recreate aggregates” under view Decisioning Data Sources, Interaction history summaries in Designer Studio
- Manually Stopping/Resuming "Aggregates_Interaction History” or “BF_Interaction History” run in Designer Studio
For further information about Interaction History Summaries and its change management, please visit: Change management in interaction history summaries
What happens if any of the actions above are performed?
- Data aggregated within the Summary Data Sets will be lost.
- For Summary Data Sets based on sources other than IH, all aggregates within Summary Data Sets will be calculated from the moment of the deployment of changes.
- For Summary Data Sets based on IH (pxResponsesStream), all aggregates within the Summary Data Sets will be recalculated from IH records exported to File Repository. During that process, all other Summary Data Sets based on IH will not be aggregated which means their aggregates will be not up to date. It will impact other components (example: contact policies, custom Strategies which use IH Summary Data Sets)
How long will Replay of IH take?
This depends on the amount of data, number of Summary Data Sets which must be recalculated and window size in these Summary Data Sets. In case there are more than 100 million interactions to replay it may take more than 24h for one dataset.
To understand the number of interactions backfill will process, you can log into PDC telemetry service. Check Usage Metrics -> CDH And Decisioning, go to the tab "Other key metrics" and check the numbers of capture Response.
If you are storing in, IH records, other than interaction responses, as for example, the offers sent, to get a more accurate number you can run GetStatistcs on pyIHSummary Data Set, and have an estimate based on the total processed records and the date when the first record was processed.
We advise you to deploy IH Summary related changes at the beginning of a weekend.
Pega’s steps to resolve this issue:
- We took immediate actions to improve awareness of the issue, through a client advisory and updating Pega documentation stating the issue more explicitly.
- We will deliver changes in next version, 24.2, to increase default number of partitions to improve performance of the Backfill process.
- We will improve awareness both in platform and in CDH to warn the user about the backfill process being triggered, changes also expected in 24.2
- We will prevent users from manually starting and stopping pre aggregation and backfill.
- We are planning mid-term changes to improve process performance and Summaries change management to reduce the number of times the Backfill process needs to be performed.