Known Issues with Embedded Hazelcast Deployments

Known Issue

SusanStewart

Member since 2003

5 posts

PEGA

Posted: May 28, 2024

Last activity: Apr 7, 2025

Posted: 28 May 2024 13:50 EDT
Last activity: 7 Apr 2025 7:38 EDT

Known Issues with Embedded Hazelcast Deployments

Introduction

Earlier versions of Pega Platform™ initially adopted certain third-party software products, including Hazelcast, as embedded services to deliver client data and stabilize the computing resources of Pega Platform applications. Beginning in Pega Platform version 8.8 and later releases, the software became fully microservice-oriented. Provisioning third-party solutions as services that are separate from Pega Platform has several key benefits, including greater security, easier maintenance, agility and scalability, better performance and stability, and a modernized deployment. For details of these benefits, see Externalization of services in your deployment.

However, Pega acknowledges that you may face challenges with externalization. For Hazelcast, Pega will continue to offer support for embedded Hazelcast in 24.2. However, this extension is for embedded Hazelcast only. The other third-party software products – Cassandra, Kafka, and Elasticsearch – must still be externalized for all clients using Release 24.2.

There may be exceptions to the support for embedded Hazelcast. Clients have reported a number of issues to Pega’s support organization. After investigation, the root cause of these particular issues was the embedded Hazelcast software. The issues were resolved by externalizing Hazelcast for those clients.

So if you are encountering issues like those described below, Pega may still strongly recommend that you externalize Hazelcast to improve your operational stability, even though embedded Hazelcast is still supported.

Symptoms

There are a number of different symptoms which can point to a problem with embedded Hazelcast. These describe some situations, but there can be others.

1. Hazelcast Partition lost exceptions

Errors:

- Node (IP: [ Proprietary information hidden]:1111) is not responding. Check the logs on [ Proprietary information hidden]:1111 for more information
- partition was lost: com.hazelcast.partition.PartitionLostEvent{partitionId=178, lostBackupCount=5, eventSource[ Proprietary information hidden]:1111}

-- Exceptions.Exception :- com.hazelcast.core.OperationTimeoutException: QueryPartitionOperation got rejected before execution due 
to not starting within the operation-call-timeout of: 60000 ms

2. “Split-brain” or cluster fracturing issue

In highly-available clustered environments, you might notice that certain nodes in your cluster cannot see one another. One node appears as if it is the one and only active node. The Cluster Management page does not show all the nodes in your Pega deployment. Upon inspection, you determine that some nodes are in a separate cluster. This error condition is sometimes referred to as Split-Brain Syndrome or cluster fracturing.

For details on this situation, see Split-Brain syndrome and Cluster Fracturing.

Note that several of the symptoms described below can be caused by “split brain.”

3. Admin Studio showing inconsistent status for nodes, listeners, agents or other general information

The node appears to be healthy and users are able to log into individual nodes; however, the node is not appearing in the Admin Studio (cluster management) page. This type of inconsistent information could also occur for listeners, agents, and other background processes usually available on the Admin Studio page.

In a Pega deployment which includes embedded Hazelcast, the same JVM hosts Pega Platform as well as the Hazelcast execution engine. This means that if there is a problem with the Pega Platform application using all the system memory or CPU (such as a badly-designed report which tries to display all two million lines from a database table), then Hazelcast will also be affected by that issue.

If the JVM becomes busy with the Pega software, then the Hazelcast “heartbeats” will be missing, which triggers a cluster regrouping, and the node can disappear from the Admin Studio page. Whenever this occurs, and a node is not part of the cluster an exception occurs. Pega relies on Hazelcast clustering technology, and if any nodes frequently leave or join the cluster, it disrupts the entire cluster’s stability.

Errors:

1. Could not connect to: / Proprietary information hidden:5701. Reason: SocketTimeoutException[null]
2023-08-30 23:54:22,306 [68c.cached.thread-11] [                    ] [                    ] (cp.TcpIpConnectionErrorHandler) 
WARN    - [ Proprietary information hidden]:5701 [bf3b51f9c00147c8a548e86931a7f68c] [3.12.10] Removing connection to endpoint [ Proprietary information hidden]:5701 
Cause => java.net.SocketTimeoutException {null}, Error-Count: 35
2023-08-30 23:54:22,406 [68c.cached.thread-14] [                    ] [                    ] (        nio.tcp.TcpIpConnector) 
INFO    - [ Proprietary information hidden]:5701 [bf3b51f9c00147c8a548e86931a7f68c] [3.12.10] Connecting to / Proprietary information hidden:5701, timeout: 10000, bind-any: true

2. java.io.IOException: Connection reset by peer
    at sun.nio.ch.FileDispatcherImpl.read0(Native Method) ~[?:1.8.0_372]
    at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) ~[?:1.8.0_372]
    at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) ~[?:1.8.0_372]
    at sun.nio.ch.IOUtil.read(IOUtil.java:197) ~[?:1.8.0_372]

3. Could not connect to: / Proprietary information hidden:5701. Reason: SocketException[Connection refused to address / Proprietary information hidden:5701]

4. Unable to fetch running listeners from nodes [util-i-03e4b00000a8566e4, util-i-0c7cb5e61b0000fb8]

Solution

It is important that you be on the latest version of Pega Platform.

For all these issues, there are workarounds. Usually a full cluster restart is recommended. However, this is not a permanent solution, as the problems can reoccur.

The permanent solution is to externalize Hazelcast – switch to a client-server topology. This allows Hazelcast to use a separate JVM, and is much more stable. For details on the externalization process, see External Hazelcast in your Deployment.

To see attachments, please log in.

Pega Platform '24.2

Pega Platform

Installation and Deployment

Known Issue and Workaround

Found in Version header: Pega Platform '24.2, Pega Platform 24.2.1, Pega Platform '25

Did you find this content helpful?

Yes

Want to help us improve this content?
Send Feedback

Reply
Like (0)
Share this page Facebook Twitter LinkedIn Email Copying... Copied!

Known Issue

Known Issues with Embedded Hazelcast Deployments

Introduction

Symptoms

Solution

Need help or want to help others?

Experience the benefits of Support Center when you log in.

Known Issue

Known Issues with Embedded Hazelcast Deployments

Introduction

Symptoms

Solution

Related content:

Need help or want to help others?

Experience the benefits of Support Center when you log in.

We'd prefer it if you saw us at our best.