Using Real time Data flow with Kafka Data set to only read new records
We are exploring how PEGA works with Kafka data set using real time data flow. How does it keep track of records which are processed/read from data set. Here is an example & observations.
1] Configure a real time data flow with Kafka data set.
2] Set the Read options as 'only read new records'.
3] Using Kafka producer post few messages to a topic which is configured in the Kafka data set. Say you have posted 3 messages.
4] Review the component statistics – Data flow run stats. – shows 3 successful records
5] Stop the data flow and post another message – 4th message
6] Start the data flow and post another message – 5th message
7] Review the components statistics – data flow run stats – you will see it has processed only the 5th message which means 4th message is lost or not processed. Is this an expected behaviour? What is the definition of ‘new record’? Is it anything posted after the data flow is started/re-started or everything posted since last processed record?
We have raised a support request with PEGA for messages getting lost based on above scenario. The GCS team suggested to raise this on support community, hence this post.