Decision data store nodes JOINING_FAILED
This morning I found the decision data store node in Pega Labs in STOPPED status.
I tried to start it, but now it is in JOINING_FAILED status and it does not change even trying to start or decommission it.
Diagnostics shows "DNodeException: Cannot bootstrap cassandra" error message and the stack trace says this error is "Caused by: com.pega.dsm.dnode.api.DNodeException: Unable to start DDS. Process has exited with code: 137 OpenJDK 64-Bit Server VM warning: Cannot open file /opt/tomcat/cassandra/logs/gc.log due to No such file or directory".
I found these warnings into Cassandra log file:
This morning I found the decision data store node in Pega Labs in STOPPED status.
I tried to start it, but now it is in JOINING_FAILED status and it does not change even trying to start or decommission it.
Diagnostics shows "DNodeException: Cannot bootstrap cassandra" error message and the stack trace says this error is "Caused by: com.pega.dsm.dnode.api.DNodeException: Unable to start DDS. Process has exited with code: 137 OpenJDK 64-Bit Server VM warning: Cannot open file /opt/tomcat/cassandra/logs/gc.log due to No such file or directory".
I found these warnings into Cassandra log file:
WARN [main] 2020-06-11 03:31:27,626 NativeLibrary.java:187 - Unable to lock JVM memory (ENOMEM). This can result in part of the JVM being swapped out, especially with mmapped I/O enabled. Increase RLIMIT_MEMLOCK or run Cassandra as root. WARN [main] 2020-06-11 03:31:27,650 StartupChecks.java:136 - jemalloc shared library could not be preloaded to speed up memory allocations WARN [main] 2020-06-11 03:31:27,651 StartupChecks.java:169 - JMX is not enabled to receive remote connections. Please see cassandra-env.sh for more info. INFO [main] 2020-06-11 03:31:27,911 SigarLibrary.java:44 - Initializing SIGAR library WARN [main] 2020-06-11 03:31:33,415 SigarLibrary.java:174 - Cassandra server running in degraded mode. Is swap disabled? : true, Address space adequate? : true, nofile limit adequate? : false, nproc limit adequate? : true I don't know if it is related, but ADMSnaphot agent was also stopped (but I was able to restart it).
What does exit code 137 means and what is the correct way to restart the decision data store node in this case?
***Edited by Moderator Marije to add Capability tags***