Creating a Batch Job Framework in Pega
What problem does it solve?
Data issues are very common on projects. In such scenarios a developer must create a utility which will process and update the records. In some situations, there will be too many records to process at once without impacting the system and so further considerations will have to be made. In addition to that, with large data sets its hard to keep track of which rows were successfully processed and which failed and must be processed again.
Overview
The Batch Job framework was implemented to make custom batch processes easier and faster to implement. The framework keeps track of all the records that were processed in a database table. This allows easier debugging and shows the progress of the execution immediately. The framework makes use of queue processing which makes processing large data sets much more efficient.
Furthermore, the framework is highly customizable based on needs and is applicable to nearly all data types in Pega. We can process a list of cases under a work class, or we can choose to process a list of data records under a data class. Its important to note that whatever table we use to pull records for processing we need to ensure that the table has a primary key.
Pros |
Cons |
Highly reusable |
Time effort to implement the framework |
Simple and quick to create new Batch Jobs |
|
Makes use of queue processing |
|
Tracks the status for each row processed |
|
Architecture
Extension points
ABC-Data-BatchJobTracker-BJXXX* (class) – Abstract class which will contain the BatchProcessExtn along with any other batch specific rules. The naming convention must follow this pattern ABC-Data-BatchJobTracker-BJXXX where XXX is the unique batch job ID defined by the developer.
BatchReportForBJXXX* (report) - Report which returns the rows to be processed. This report must join onto the BatchJobTracker class using two filter conditions shown below. In addition to that the report must have at least one filter onto the ReferenceID property ensuring it is null (this means the item has not been processed yet). The naming convention must follow this pattern BatchReportForBJXXX where XXX is the unique batch job ID defined by the developer.
BatchProcessExtn* (activity) - Custom logic for processing an individual record (case for example).
GetBatchJobObj (activity) – Activity which assigns the object key. By default, it is setting the pzInsKey of the record to be processed. It can be overridden in the newly created abstract class.
*Required to implement a new batch process
How it works
In order to implement a new batch process using the framework follow these steps:
- Create a new abstract class inheriting from the concrete BatchJobTracker class. Example: ABC-Data-BatchJobTracker-BJ123
- Create a report in the desired class to gather the list of records to be processed. Example: BatchReportForBJ123
- Remember about the join and filter condition!
- Create BatchProcessExtn activity with custom logic adding error handling as needed.
- Execute the BatchJob activity using the correct parameters.
Future considerations
Once you set up a working framework you will see that the BatchJobTracker table will quickly become filled with a lot of data. You can create an activity utility which can be run periodically to clean up the table.
To make the framework more user friendly you can build a batch execution case type which takes the parameters from the user and creates an individual case for each batch job execution. A Job Scheduler can run periodically picking up the active batch execution cases and call the Batch Job framework with the parameters set on the case. This will allow more customization. Job scheduler will only process the batch execution cases in Open-Active status. Adding more parameters to control the off-running hours etc. is also possible.
In situations where you have multiple applications and strict security configurations you can expand the framework with an additional parameter “AccessGroup”. This parameter can be passed down to the QueueBatchObj activity which queues the records for processing using the dedicated queue processor. In the advanced configuration you can set the alternate access group to take the new parameter.
Rules needed
- ABC-Data-BatchJobTracker (class) – concrete class to store data for the Batch Job executions. This class contains the following columns:
- RefererenceID (string) – key of the object to be processed
- BatchName (key) – Name of the batch (eg BJ123)
- ProcessedTimestamp (string) – timestamp of the execution
- Status (string) –status of the processed row
- BatchJob (activity) – main activity which is used to trigger the batch job execution. It calculates the correct class (QueueBatchClass) and prepares a temp page for processing before calling the QueueBatch activity. The activity takes the following parameters:
- BatchJobName (string) - Name. This is subclass name of: GFG-Data-BatchJobTracker
- ReportName (string) - Raport Name
- ReportClass (string) - Report Class
- NumberOfRecords (integer) – Number of records that will be processed
- ProcessImmediately (boolean) – If checked runs without queue processor
- QueueBatch (activity) – Calls the appropriate report based on the parameters. Iterates over the pxResults creating a BatchJobObject page. For each record first calls the GetBatchJobObj followed by the QueueBatchObj activity. The activity takes the same parameters as the BatchJob activity. In addition to that it also takes:
- QueueBatchClass (string) – class of the specialized batch job.
- GetBatchJobObj (activity) – Initializes the BatchJobOject page and assigns the pzInsKey with the row key from the report. The activity takes the following parameters:
- BatchJobObject (page name) – page for the batch job object to be processed
- QueueBatchReportItem (page name) – page with the item from the report
- Status (string) – output parameter which returns the status of the activity
- QueueBatchObj (activity) – Depending on the ProcessImmediately parameter the activity calls the BatchProcess activity either directly or using the dedicated queue processor in the context of the BatchJobObject. The activity takes the following parameters:
- BatchJobName (string) – name of the batch job
- BatchJobObject (page name) – page for the batch job object to be processed
- ProcessImmediately (boolean) – If checked runs without queue processor
- QueueBatchClass (string) – class of the specialized batch job.
- BatchProcess (activity) – it is implemented on the @baseclass layer. The activity uses the BatchJobObject page and opens the object using the key stored on the page. It then calls the BatchProcessExtn followed by the UpdateBatchJobTracker activity. The activity takes the following parameters:
- BatchJobName (string) – name of the batch job
- ProcessImmediately (boolean) – If checked runs without queue processor
- QueueBatchClass (string) – class of the specialized batch job.
- BatchProcessExtn (activity) – Empty activity which will be overridden in the batch specific class.
- BatchJobName (string) – name of the batch job
- BatchJobObject (page name) – page for the batch job object to be processed
- ProcessImmediately (boolean) – If checked runs without queue processor
- Status (string) – output parameter which returns the status of the activity
- UpdateBatchJobTracker (activity) – updates the BatchJobTracker table with the status and the timestamp. Takes the following parameters:
- BatchJobTrackerHandle (string) – key for the processing item
- ProcessedTimestamp (string) – timestamp of the execution
- Status (string) – input parameter which returns the status of the activity
- BatchJobProcess (dedicated queue processor) – for queueing items to be processed by the BatchProcess activity.