Externalisation of Kafka service

Question

Ratan

Member since 2015

65 posts

Areteans Technology Solutions

Posted: Mar 30, 2023

Last activity: Apr 25, 2023

Posted: 30 Mar 2023 19:46 EDT
Last activity: 25 Apr 2023 18:54 EDT

Closed

Solved

Externalisation of Kafka service

Report

The official documentation here says that moving forward Pega would no longer support embedded Kafka. What does this mean from a Queue Processor stand point? The academy topic about Queue Processors here says,

As soon as you run a queue processor, the system creates a topic that corresponds to this queue processor in the Kafka server. Based on the number of partitions mentioned in the server.properties file, the system creates the same number of folders in the tomcat\Kafka-data folder.

At least one stream node is necessary for the system to queue messages to the Kafka server. If you do not define a stream node in a cluster, the system queues items to the database, and then these items are processed when a stream node is available.

This makes sense as the nodes that are classified as stream nodes still have access to the rules engine and the database which allows the queued events to be executed.

As soon as you run a queue processor, the system creates a topic that corresponds to this queue processor in the Kafka server. Based on the number of partitions mentioned in the server.properties file, the system creates the same number of folders in the tomcat\Kafka-data folder.

At least one stream node is necessary for the system to queue messages to the Kafka server. If you do not define a stream node in a cluster, the system queues items to the database, and then these items are processed when a stream node is available.

This makes sense as the nodes that are classified as stream nodes still have access to the rules engine and the database which allows the queued events to be executed.

When the stream nodes are externalised, where do these queue processors run?
How would they get access to the rules engine and the database if the platform is not connected to external kafka nodes?
Does this change mean that moving forward, Pega platform MUST be connected with external Kafka services so that queue processors would function?

Thanks,

Ratan Balaji

Show Less

To see attachments, please log in.

Pega Platform '23

Pega Platform

Pega Cloud

System Administration

Cross-Industry

Lead System Architect

Like (0)
Share this page Facebook Twitter LinkedIn Email Copying... Copied!

Accepted Solution

Posted: 2 years ago

Updated: 2 years ago

Posted: 20 Apr 2023 0:57 EDT
Updated: 25 Apr 2023 18:53 EDT

Sriharsha A

JPMorgan Chase

replied to Ratan

Report

@Ratan

Yes data is handed over to auto generated data flow.

Data flow runs with the support of stream node on background node. No stream node , data flow wont work.

Queue processor ( running on background node) gets the data from Data flow ( running with the support of stream node)

More stream nodes - is vertical scaling of the nodes - high through put with messages spread across nodes.

More background nodes can support both horizontal and vertical scaling - high through put of queue processing.

Always number of threads running for a given queue processor should match the number of partitions.

For E.g., You have a QueueProcessor named - Ratan configured to run on 5 threads on background node.

2 stream nodes with default partition size of 5 , 4 back ground nodes configured.

Data is replicated in both the stream nodes.

Lets assume each record in the queue is taking 1 min to process.

with the given configurations per minute 20 records can be processed. (4 background nodes - each node 5 threads - 5*4=20) - 20 data flow parallel runs per minute speeding up the queue execution.

Kafka is doing the queue balancing, message replication, message dequeue once processed refreshing with latest message and rebalancing.

@Ratan

Yes data is handed over to auto generated data flow.

Data flow runs with the support of stream node on background node. No stream node , data flow wont work.

Queue processor ( running on background node) gets the data from Data flow ( running with the support of stream node)

More stream nodes - is vertical scaling of the nodes - high through put with messages spread across nodes.

More background nodes can support both horizontal and vertical scaling - high through put of queue processing.

Always number of threads running for a given queue processor should match the number of partitions.

For E.g., You have a QueueProcessor named - Ratan configured to run on 5 threads on background node.

2 stream nodes with default partition size of 5 , 4 back ground nodes configured.

Data is replicated in both the stream nodes.

Lets assume each record in the queue is taking 1 min to process.

with the given configurations per minute 20 records can be processed. (4 background nodes - each node 5 threads - 5*4=20) - 20 data flow parallel runs per minute speeding up the queue execution.

Kafka is doing the queue balancing, message replication, message dequeue once processed refreshing with latest message and rebalancing.

Value of external Kafka is wrt to data security, administration, partitioning, size. data resiliency ,licensing and market latest upgrade - which are not fully fulfilled with embedded Kafka. Any messages are struck, not processed, lost, requiring restart of zookeeper to rebalance messages etc. many more administrative tasks - all these are not doable with embedded Kafka.

For E.g., A client have AWS Kafka enterprise license with full set of features, why would client need embedded Kafka with limited privileges with a no or less administrative access, no control. In cloud a POD and it's data is not guaranteed. It's pay per use in Cloud, if stream nodes are not active and the thresholds are below the normal usage limits and PODs will be down in Kubernetes and entire Kafka folders inside the embedded Kafka will get deleted if no persistence volume is added.

Lets assume you have your production running in Cluster 1 which is North America Boston data center which is down because of cyclone, for business continuity you are spinning up North America Washington data center Cluster. Now data is physically stored in Cluster 1 embedded node which cannot be replicated to Cluster 2 as you have lost connections.

Addressing all these external Kafka which performs all these replication, rebalance, dequeue, enqueue between active cluster and in-active cluster client data is safe and ready for business continuity and resilient.

Show Less

View reply inline

To see attachments, please log in.

Posted: 2 years ago

Posted: 30 Mar 2023 21:05 EDT

ArulDevan

NCS Pte. Ltd

replied to Ratan

Report

Hi @Ratan: Yes. As mentioned in the article, Pega Infinity '23 release will not support embedded 3rd party services. Not only for Kafka, please see this for overall scope.

Queue processors will be still running in the nodes based on your Node type configuration. Queue processor rules rely on the stream nodes which will now connected to external kafka for its processing.

Thanks.

To see attachments, please log in.

Likes (1)

Marije Schillern

Posted: 2 years ago

Posted: 31 Mar 2023 20:15 EDT

Ratan

Areteans Technology Solutions

replied to ArulDevan

Report

@ArulDevan Thanks for your response.

Does this chage mean, as a Pega consumer, I need to have a functional and stable enterprise Kafka set up to support queue processors in Pega?

If, as an organization, I don't use Kafka, would Pega queue processors won't work?

To see attachments, please log in.

Like (0)

Posted: 2 years ago

Posted: 31 Mar 2023 23:26 EDT

ArulDevan

NCS Pte. Ltd

replied to Ratan

Report

Hi @Ratan, Yes. That’s right. If there is a enterprise kafka setup available, you can connect to it. Else you can have specific installation of Kafka. Please see platform support guide for supported version details . Thanks.

To see attachments, please log in.

Like (0)

Posted: 2 years ago

Posted: 5 Apr 2023 20:32 EDT

Ratan

Areteans Technology Solutions

replied to ArulDevan

Report

@ArulDevan Apologies. That response didn't really help with any of the questions asked.

To see attachments, please log in.

Like (0)

Posted: 2 years ago

Posted: 5 Apr 2023 23:46 EDT

Sriharsha A

JPMorgan Chase

replied to Ratan

Report

@Ratan

Hi Ratan,

1. For queue processors to work customer should either have an embedded kafka or an external kafka. Going forward embedded Kafka is not supported from Platform. So clients must use external Kafka to leverage the feature of Queue processor.

2. When Kafka is externalized , queue processor related data would sit in the external kafka as per the configuration. Queue processor would continue to run on configured nodes by reading the messages from respective external Kafka.

3. If platform is not configured with any stream nodes then queue processor will not work.

4. Reading the message, access Kafka queue, topic and administering for queue and topic creation is all managed by stream node related code base of Pega engine. External Kafka need not have access to Pega engine and Database. because Pega engine manages Kafka when connection is established. Kafka does not need to know Pega engine related details. it's one way.

@Ratan

Hi Ratan,

3. If platform is not configured with any stream nodes then queue processor will not work.

Why is this approach - you can have control on your own Kafka, Platform license, Data encryption, Resiliency guaranteed. With Embedded Kafka in cloud a persistence volume to Pega engine to store data is too costly. Resiliency is challenged. replication is delayed between Pods when clusters are spread across multiple geographical locations. Ingress and egress will be given for fixed IP's. where as in cloud the load balancers will have fixed IP's not the pods to handle the Kafka data replication. considering all the future modernization it is recommended to leave back embedded Kafka and engage external Kafka.

Show Less

To see attachments, please log in.

Likes (2)

Ratan Balaji Marije Schillern

Posted: 2 years ago

Updated: 2 years ago

Posted: 17 Apr 2023 22:06 EDT
Updated: 17 Apr 2023 23:11 EDT

Ratan

Areteans Technology Solutions

replied to Sriharsha A

Report

@SriHarsha Anika Thanks for your response. Say, we have an environment with 1 background processing node and 1 externalised stream node. If I configure a queue processor rule and associate it with the background processing node, where does the queue processor actually process an event? I believe this happens in the backgroundprocessing node.

1) This would mean that the external kafka instance won't be a part of the Pega cluster as it doesn't run the Pega app server.

2) The auto generated data flow of the queue processor contains the logic to process the event. This DF runs on the background processing node as well.

Please validate the above points.

So, what's the advantage is externalising the stream nodes?

To improve the performance, we would add more background processing nodes to have more queue processors. So, what's the value add of the externalised kafka stream nodes?

Thanks,

Ratan Balaji.

To see attachments, please log in.

Like (0)

Posted: 2 years ago

Updated: 2 years ago

Posted: 18 Apr 2023 20:48 EDT
Updated: 18 Apr 2023 20:53 EDT

Sriharsha A

JPMorgan Chase

replied to Ratan

Report

@Ratan

Yes external Kafka instance does not run Pega App Server

So, what's the advantage is externalising the stream nodes? -> to have the data placed in Topic's dedicated to a queue processor, balance, replicate them and perform the data streaming which is actually responsible for queue processing , its up scaling vertically in all the available nodes is taken care by stream node.

When a queue entry is performed, it gets stored in Kafka. if failed processing then failure record will be stored in database for administrative purpose.

When Kafka is with in Pega engine - administrative side of it is challenged, upgrade, update of the internal kafka, encryption for data security are questioned.

Solving the client needs to have full control on Streaming, security, data protection, maintenance costs - it is now externalized outside of engine

To see attachments, please log in.

Likes (2)

Ratan Balaji Abinaya M

Posted: 2 years ago

Posted: 18 Apr 2023 22:01 EDT

Ratan

Areteans Technology Solutions

replied to Sriharsha A

Report

@SriHarsha Anika Thanks for the quick response. I totally understand the advantages of externalising Kafka. I believe the answer is still incomplete.

When a queue entry is performed, it gets stored in Kafka.

1) When a page is queued, the data is queued as an event and stored in Kafka. Is the data handed over to the auto generated queue processor data flow running on the background node for processing?

To improve the performance, we would add more background processing nodes to have more queue processors. So, what's the value add of the externalised kafka stream nodes?

2) What's the impact/ influence of adding more Kafka nodes Vs. Adding more background processing nodes?

Thanks,

To see attachments, please log in.

Like (0)

Accepted Solution

Posted: 2 years ago

Updated: 2 years ago

Posted: 20 Apr 2023 0:57 EDT
Updated: 25 Apr 2023 18:53 EDT

Sriharsha A

JPMorgan Chase

replied to Ratan

Report

@Ratan

Yes data is handed over to auto generated data flow.

Data flow runs with the support of stream node on background node. No stream node , data flow wont work.

Queue processor ( running on background node) gets the data from Data flow ( running with the support of stream node)

More stream nodes - is vertical scaling of the nodes - high through put with messages spread across nodes.

More background nodes can support both horizontal and vertical scaling - high through put of queue processing.

Always number of threads running for a given queue processor should match the number of partitions.

For E.g., You have a QueueProcessor named - Ratan configured to run on 5 threads on background node.

2 stream nodes with default partition size of 5 , 4 back ground nodes configured.

Data is replicated in both the stream nodes.

Lets assume each record in the queue is taking 1 min to process.

with the given configurations per minute 20 records can be processed. (4 background nodes - each node 5 threads - 5*4=20) - 20 data flow parallel runs per minute speeding up the queue execution.

Kafka is doing the queue balancing, message replication, message dequeue once processed refreshing with latest message and rebalancing.

@Ratan

Yes data is handed over to auto generated data flow.

Data flow runs with the support of stream node on background node. No stream node , data flow wont work.

Queue processor ( running on background node) gets the data from Data flow ( running with the support of stream node)

More stream nodes - is vertical scaling of the nodes - high through put with messages spread across nodes.

More background nodes can support both horizontal and vertical scaling - high through put of queue processing.

Always number of threads running for a given queue processor should match the number of partitions.

For E.g., You have a QueueProcessor named - Ratan configured to run on 5 threads on background node.

2 stream nodes with default partition size of 5 , 4 back ground nodes configured.

Data is replicated in both the stream nodes.

Lets assume each record in the queue is taking 1 min to process.

with the given configurations per minute 20 records can be processed. (4 background nodes - each node 5 threads - 5*4=20) - 20 data flow parallel runs per minute speeding up the queue execution.

Kafka is doing the queue balancing, message replication, message dequeue once processed refreshing with latest message and rebalancing.

Show Less

To see attachments, please log in.

Likes (5)

Anoop Krishna Saikishore Sanagapalli Ratan Balaji Abinaya M Shubham Grover

Posted: 2 years ago

Posted: 25 Apr 2023 18:54 EDT

Ratan

Areteans Technology Solutions

replied to Sriharsha A

Report

@SriHarsha Anika Thanks for taking time in responding to this thread.

To see attachments, please log in.

Like (0)

Question

Externalisation of Kafka service

Need help or want to help others?

Experience the benefits of Support Center when you log in.

Question

Externalisation of Kafka service

Related content:

Need help or want to help others?

Experience the benefits of Support Center when you log in.

We'd prefer it if you saw us at our best.