Question
Blue Shield of California
US
Last activity: 2 Jun 2023 9:57 EDT
Admin Studio displays 0 connected nodes. PDC shows all nodes are active. How can I trouble shoot this?
Hello,
First and foremost, thank you for reviewing this challenge we are facing. I have searched for similar issues and found a few posts similar to mine. Most are abandoned by the original submitter while others do not provide a satisfactory reply. I hope I can break that cycle.
Summary of issue:
When we restart our servers for any reason, all nodes initially display in Admin Studio. After some time, this changes and we are presented with 0 nodes running. I've confirmed in PDC that the nodes are indeed running (and I can navigate to the services section in DEV studio and can see stream nodes running, etc.) Furthermore, once this has occurred, we can no longer stop or start listeners or see any requestors. I've witnessed this in multiple environments ranging from DEV to PROD. The only fix for this, and it has been temporary, is to restart the servers.
For reference, we are using Pega Platform 8.6.3 (on-premises). We have 4 web nodes and 2 background processing/stream nodes. This issue has been occurring for a few months.
What we've tried so far (per Pega GCS):
Hello,
First and foremost, thank you for reviewing this challenge we are facing. I have searched for similar issues and found a few posts similar to mine. Most are abandoned by the original submitter while others do not provide a satisfactory reply. I hope I can break that cycle.
Summary of issue:
When we restart our servers for any reason, all nodes initially display in Admin Studio. After some time, this changes and we are presented with 0 nodes running. I've confirmed in PDC that the nodes are indeed running (and I can navigate to the services section in DEV studio and can see stream nodes running, etc.) Furthermore, once this has occurred, we can no longer stop or start listeners or see any requestors. I've witnessed this in multiple environments ranging from DEV to PROD. The only fix for this, and it has been temporary, is to restart the servers.
For reference, we are using Pega Platform 8.6.3 (on-premises). We have 4 web nodes and 2 background processing/stream nodes. This issue has been occurring for a few months.
What we've tried so far (per Pega GCS):
- Set the prconfig setting ‘cluster/hazelcast/v4/enabled=true’ explicitly to point to Hazelcast 4.x.
- Added <env name="identification/cluster/protocol" value="hazelcast" /> to the prconfig
- On cluster restart , check for any stale entries in the pr_sys_statusnodes table
Has anyone else faced this and solved this pervasive issue? Any ideas would be great!