Real time data flow consumes Kafka or Stream Data set to read existing and new records
We currently use real time data flows that consume Kafka or Stream data set information. These DFs are configured to read only new records that arrive (Only read new records).
The problem with this configuration is that Pega does not process those records that have been arriving during the time in which the data flow was stopped, it means that we lose them.
The other configuration option is to process the records that already existed and the new records. This way we would not lose these records that are sent while the data flow is stopped, but we run the risk of processing the same record several times. Before we consider changing the configuration option we need to be clear about how this second configuration option works:
- Is it possible to process records that are 2 years old? Is there any criterion applied to reprocess these records or is absolutely everything processed regardless of the date of sending?