Question
Pegasystems Inc.
IN
Last activity: 27 Feb 2024 4:50 EST
Externalized services (Hazelcast, Kakfa and Search) monitoring
- How will an application behave in the below scenarios?
- When the Hazelcast is down.
- External Kafka nodes are down or corruption of replicas?
- Are any new db tables introduced to monitor HZ nodes with cluster information like cluster_ID, and Node information?
- In the case of nodes' unhealthy state, with application recycling is that node going to join on the same HazelCast cluster, or will it behave like a split-brain?
- How to validate stream nodes, like partitions, leaders, under-replicated, and offline count information?
- To do cluster restart, currently, there is an order to be followed(Stream first, followed by Search and Web nodes). Should that same process be followed?
- What monitoring tools are recommended to monitor Hazelcast and Kafka services for on-premise clients?
-
Reply
-
Vinay Karumanchi -
Share this page Facebook Twitter LinkedIn Email Copying... Copied!
Accepted Solution
Updated: 27 Feb 2024 4:50 EST
Pegasystems Inc.
IN
@Shakes Sharing the information I have gathered
Hazel cast
-
How will the application behave while the Hazel cast nodes are down?
- If a single Hazel cast node goes down, restarting it will make the node join back the cluster and you should not see any other issues.
-
If the complete Hazel cast cluster is down, then a complete cluster (including Pega tiers) restart is required.
-
Were any new DB tables introduced to monitor externalized Hazel cast nodes with cluster information like cluster_ID, Node information?
-
No there is no DB table added to monitor Hazel cast node information. We can make use of health endpoints to get node information.
-
Will the nodes join back the same Hazel cast cluster in case of un-healthy nodes getting restarted or will we see split brain issue?
-
Yes, the nodes will join back the same Hazel cast cluster in case of a node restart.
-
Which monitoring tools should be used to watch the Hazel cast server?
-
We can use the same monitoring tools used for monitoring the Pega nodes.
Kafka
@Shakes Sharing the information I have gathered
Hazel cast
-
How will the application behave while the Hazel cast nodes are down?
- If a single Hazel cast node goes down, restarting it will make the node join back the cluster and you should not see any other issues.
-
If the complete Hazel cast cluster is down, then a complete cluster (including Pega tiers) restart is required.
-
Were any new DB tables introduced to monitor externalized Hazel cast nodes with cluster information like cluster_ID, Node information?
-
No there is no DB table added to monitor Hazel cast node information. We can make use of health endpoints to get node information.
-
Will the nodes join back the same Hazel cast cluster in case of un-healthy nodes getting restarted or will we see split brain issue?
-
Yes, the nodes will join back the same Hazel cast cluster in case of a node restart.
-
Which monitoring tools should be used to watch the Hazel cast server?
-
We can use the same monitoring tools used for monitoring the Pega nodes.
Kafka
-
How does the application behave when External Kafka nodes are down or there is corruption of replicas?
-
The user experience would be same as with embedded Kafka. We would see slowness in application when Kafka is down or has offline replicas. Any new messages to be processed by QP would be stored in the delayed queue table till Kafka cluster is back online.
-
How to validate stream services, like partitions, leaders, under-replicated, offline count info?
-
New relic dashboards can be created for Kafka service, also MSK already has tons of dashboard widgets that can be used for monitoring.
-
To do cluster restart, currently there is an order to be followed. Does that same process be followed?
-
There is no specific order of restart post Pega 8.4, nodes can be restarted in any order. However, to avoid failures it is advisable to start the nodes accepting traffic at last.
-
What should be the default Heap memory configuration for externalized Kafka?
-
Typically, 1 GB is fine which is also our default in embedded Kafka, the heap memory can be increased to 2GB in the clusters having over 1000 partitions. This must be evaluated in the lower environment before applying to production.
Note: Once Kafka is externalized, we do not need any stream nodes anymore.
Updated: 27 Feb 2024 4:50 EST
Pegasystems Inc.
GB
@Ganesh Sharma can you confirm that you've first gone through the available documentation?
Externalization of services in your deployment
External Kafka in your deployment
External Elasticsearch in your deployment
External Hazelcast in your deployment
Third-party externalized services Deployment Changes FAQs
Monitoring an embedded stream service > Database
@Ganesh Sharma can you confirm that you've first gone through the available documentation?
Externalization of services in your deployment
External Kafka in your deployment
External Elasticsearch in your deployment
External Hazelcast in your deployment
Third-party externalized services Deployment Changes FAQs
Monitoring an embedded stream service > Database
Understanding pr_data_stream tables > Nodes_
Externalisation of Kafka service
In Pega 8.8, if Hazelcast is down, it can impact the application's performance and functionality. Features that rely on Hazelcast, such as distributed sessions, decisioning services, and others, may not function as expected. If the external Kafka nodes are down or there is corruption of replicas, it can disrupt the functioning of Queue Processors and Data Flows, as they rely on Kafka for message processing. This can lead to delays or failures in processing queued items. In both cases, Pega has implemented resilience measures to handle such scenarios. However, it's important to monitor and maintain the health of these services to ensure optimal application performance.
In Pega 8.8, the pr_sys_statusnodes table is used to monitor the status of nodes in a Hazelcast cluster. If a node is in an unhealthy state and the application is recycled, the node should rejoin the same Hazelcast cluster, provided the network settings and cluster configuration have not changed. To validate stream nodes, you can use the pr_data_stream_nodes table which contains information about the Kafka cluster, including the list of all known Stream nodes, topics, data partition distribution, and the current controller node. For cluster restarts, it is recommended to follow the same process of restarting Stream nodes first, followed by Search and Web nodes. For monitoring Hazelcast and Kafka services, Pega provides built-in tools like PDC for system health monitoring. For on-premise clients, additional monitoring tools can be used based on the organization's preference and infrastructure, such as Prometheus, Grafana, or any other tool that supports JMX monitoring.
⚠ This is a GenAI-powered tool. All generated answers require validation against the above provided references.
Please check internally for our HZ and Kafka experts if you need more help with this.
Please mark Accept Solution if you're happy with the provided resources
Pegasystems Inc.
IN
@MarijeSchillern Yes I did and got my answers from the Pega product team, you can mark this closed.
Northbridge
CA
@Ganesh Sharma Hi Ganesh,
Do you mind sharing the answers here as we are also looking for the same, thanks.
Accepted Solution
Updated: 27 Feb 2024 4:50 EST
Pegasystems Inc.
IN
@Shakes Sharing the information I have gathered
Hazel cast
-
How will the application behave while the Hazel cast nodes are down?
- If a single Hazel cast node goes down, restarting it will make the node join back the cluster and you should not see any other issues.
-
If the complete Hazel cast cluster is down, then a complete cluster (including Pega tiers) restart is required.
-
Were any new DB tables introduced to monitor externalized Hazel cast nodes with cluster information like cluster_ID, Node information?
-
No there is no DB table added to monitor Hazel cast node information. We can make use of health endpoints to get node information.
-
Will the nodes join back the same Hazel cast cluster in case of un-healthy nodes getting restarted or will we see split brain issue?
-
Yes, the nodes will join back the same Hazel cast cluster in case of a node restart.
-
Which monitoring tools should be used to watch the Hazel cast server?
-
We can use the same monitoring tools used for monitoring the Pega nodes.
Kafka
@Shakes Sharing the information I have gathered
Hazel cast
-
How will the application behave while the Hazel cast nodes are down?
- If a single Hazel cast node goes down, restarting it will make the node join back the cluster and you should not see any other issues.
-
If the complete Hazel cast cluster is down, then a complete cluster (including Pega tiers) restart is required.
-
Were any new DB tables introduced to monitor externalized Hazel cast nodes with cluster information like cluster_ID, Node information?
-
No there is no DB table added to monitor Hazel cast node information. We can make use of health endpoints to get node information.
-
Will the nodes join back the same Hazel cast cluster in case of un-healthy nodes getting restarted or will we see split brain issue?
-
Yes, the nodes will join back the same Hazel cast cluster in case of a node restart.
-
Which monitoring tools should be used to watch the Hazel cast server?
-
We can use the same monitoring tools used for monitoring the Pega nodes.
Kafka
-
How does the application behave when External Kafka nodes are down or there is corruption of replicas?
-
The user experience would be same as with embedded Kafka. We would see slowness in application when Kafka is down or has offline replicas. Any new messages to be processed by QP would be stored in the delayed queue table till Kafka cluster is back online.
-
How to validate stream services, like partitions, leaders, under-replicated, offline count info?
-
New relic dashboards can be created for Kafka service, also MSK already has tons of dashboard widgets that can be used for monitoring.
-
To do cluster restart, currently there is an order to be followed. Does that same process be followed?
-
There is no specific order of restart post Pega 8.4, nodes can be restarted in any order. However, to avoid failures it is advisable to start the nodes accepting traffic at last.
-
What should be the default Heap memory configuration for externalized Kafka?
-
Typically, 1 GB is fine which is also our default in embedded Kafka, the heap memory can be increased to 2GB in the clusters having over 1000 partitions. This must be evaluated in the lower environment before applying to production.
Note: Once Kafka is externalized, we do not need any stream nodes anymore.