Support Center

Question

ShelbyS6

Member since 2020

2 posts

Credera

Posted: Oct 28, 2020

Last activity: Nov 4, 2020

Posted: 28 Oct 2020 16:12 EDT
Last activity: 4 Nov 2020 15:12 EST

Closed

Pega Marketing Cluster - Startup/Shutdown errors

Report

I am hosting a multi-node Pega Marketing environment in a non-Pega client cloud. I was able to successfully get a cluster of 8 nodes (see attached for node types).

There were numerous errors in the startup/shutdown and I need to know if these are expected or not.

1) When shutting down the nodes in the correct order, got this error on remaining nodes when the last decisioning node was removed

"2020-10-28 01:59:48,003 [ster3-reconnection-0] [ ] [ ] [ ] ( driver.core.ControlConnection) ERROR - [Control connection] Cannot connect to any host, scheduling retry in 30000 milliseconds"

2) When last stream node was removed, got multiple info messages of "Waiting for Kafka sync updates" on remaining nodes then got this exception before shutdown completed

"ERROR - Brokers=[{ id:1001, status:Online, prpcNodeId:backgroundnode1, controller:true, partitionsCount:382, underReplicatedPartitionsCount:380 } { id:1002, status:Offline, prpcNodeId:backgroundnode2, controller:false, partitionsCount:383, underReplicatedPartitionsCount:380 }], rollingRestartReady=false com.pega.dsm.dnode.api.StreamServiceException: Kafka syncing timeout: cluster state didn't change for 300000 ms"

I am hosting a multi-node Pega Marketing environment in a non-Pega client cloud. I was able to successfully get a cluster of 8 nodes (see attached for node types).

There were numerous errors in the startup/shutdown and I need to know if these are expected or not.

1) When shutting down the nodes in the correct order, got this error on remaining nodes when the last decisioning node was removed

"2020-10-28 01:59:48,003 [ster3-reconnection-0] [ ] [ ] [ ] ( driver.core.ControlConnection) ERROR - [Control connection] Cannot connect to any host, scheduling retry in 30000 milliseconds"

2) When last stream node was removed, got multiple info messages of "Waiting for Kafka sync updates" on remaining nodes then got this exception before shutdown completed

3) When restarting the nodes in the cluster, in the correct order, got this error when starting the first decisioning (aka Cassandra) node

2020-10-28 02:22:12,143 [ionChangeExecutor:41] [ STANDARD] [ ] [ ] (andraSessionCache$SessionProxy) ERROR - Statement could not be executed, because there was no available host com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: /{ipaddressremoved}:9042 (com.datastax.driver.core.exceptions.UnavailableException: Not enough replicas available for query at consistency ONE (1 required but only 0 alive)))

After receiving each of these errors, I continued with the startup/shutdown and can confirm they happen every time. Can someone please confirm these errors are expected.

Show Less

***Edited by Moderator Marissa to update Platform Capability tags****

To see attachments, please log in.

Pega Marketing 8.4

Decision Management

Like (0)
Share this page Facebook Twitter LinkedIn Email Copying... Copied!

Posted: 4 years ago

Posted: 30 Oct 2020 6:23 EDT

SibasisMohanty

PEGA

replied to ShelbyS6

Report

Hello,

What is the Pega Platform version? Are all the nodes added to the LB and accepting inbound traffic? The error suggests that the nodes going down were accepting new inbound requests during the shutdown. For example: In Error#2 we wait for 5 minutes to sync-up messages with other kafka nodes in cluster. Let's assume that broker A is down and broker B is up and still accepting messages. While shutting down broker B there is a possibility that broker A can also be started back-up and can accept new messages. If we do not have the sycn operation then each node will end-up with its own copies of messages. Due to that reason, we wait for 5 mins by default for kafka sync operation assuming the other broker to be available during that time. If the broker A is unavailable then broker B will timeout of the sync operation and proceed with shutdown. The issue with Cassandra in #1 and #3 is more or less similar, we are waiting on acknowledgement from DDS but all Cassandra processes are down.

The above issues are most likely due to fact that the nodes are accepting inbound traffic during the shutdown process.

Thanks,

To see attachments, please log in.

Like (0)

Posted: 4 years ago

Updated: 4 years ago

Posted: 4 Nov 2020 13:13 EST
Updated: 4 Nov 2020 13:29 EST

ShelbyS6

Credera

replied to SibasisMohanty

Report

Hi Sibasis,

Thank you for your reply. The only two nodes that are connected to the load balancer are the Web nodes (from screenshot: privatewebnode1 and privatewebnode2) and I take those down first. So the rest of the nodes would not be receiving any traffic during the shutdown process.

Pega Platform version is 8.4.1

Thanks, Shelby

To see attachments, please log in.

Like (0)

Question

Pega Marketing Cluster - Startup/Shutdown errors

Need help or want to help others?

Experience the benefits of Support Center when you log in.

Question

Pega Marketing Cluster - Startup/Shutdown errors

Related content:

Need help or want to help others?

Experience the benefits of Support Center when you log in.

We'd prefer it if you saw us at our best.