ISPN000299: Unable to acquire lock after 15 seconds for key SessionCreationMetaDataKey
This is related to incident INC-268701.
We recently upgraded our application from pega v7.2 to pega v8.6.5 . After we migrated to prod, for 2 days everything worked fine after setting up the prod environment. However from the 3rd day onwards, we started seeing a lot of slowness, as well as the following error, on logging in with administrator access : Org.infinispan.util.concurrent.TimeoutException : ISPN000299: Unable to acquire lock after 15 seconds for key SessionCreationMetaDataKey(NCb-6AkmthbIwwoZpSMc8kqoeRwtJ9G-JtQ5hYWu) and requestor GlobalTx:Pega-MX-slave2-mxcyvlpras2005:pega-mx-server-one:2036. Lock is held by GlobalTx:Pega-MX-slave2-mxcyvlpras2005:pega-mx-server-one:2034
We're seeing a lot of deadlocks related to the job scheduler PZPURGEPRSYSSTATUSNODES in the logs :
This is related to incident INC-268701.
We recently upgraded our application from pega v7.2 to pega v8.6.5 . After we migrated to prod, for 2 days everything worked fine after setting up the prod environment. However from the 3rd day onwards, we started seeing a lot of slowness, as well as the following error, on logging in with administrator access : Org.infinispan.util.concurrent.TimeoutException : ISPN000299: Unable to acquire lock after 15 seconds for key SessionCreationMetaDataKey(NCb-6AkmthbIwwoZpSMc8kqoeRwtJ9G-JtQ5hYWu) and requestor GlobalTx:Pega-MX-slave2-mxcyvlpras2005:pega-mx-server-one:2036. Lock is held by GlobalTx:Pega-MX-slave2-mxcyvlpras2005:pega-mx-server-one:2034
We're seeing a lot of deadlocks related to the job scheduler PZPURGEPRSYSSTATUSNODES in the logs :
ERROR - [PersistentJobExecutionFactory] Job[pzPurgePRSysStatusNodes] execution lock has failed. com.pega.pegarules.pub.database.LockFailureException: Exception occurred while retrieving existing lock PZPURGEPRSYSSTATUSNODES: code: <none> SQLState: Problem executing lock check: code: 1205 SQLState: 40001 Message: Transaction (Process ID 83) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction. DatabaseException caused by prior exception: com.microsoft.sqlserver.jdbc.SQLServerException: Transaction (Process ID 83) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction. | SQL Code: 1205 | SQL State: 40001
We have also noticed the below observations :
-
This slowness is only observed for dev/ops/admin users, i.e. while accessing the dev studio/admin studio/app studio. Our branch users who access the applications are fine. Hence the impact is more for the ops team who needs to access the admin studio, and this would also impact importing packages during deployments.
-
Somehow all the 6 nodes got configured as STREAM nodes, and the job scheduler PZPURGEPRSYSSTATUSNODES runs on all nodes.
Has anyone else faced similar issues? What is the impact if all nodes are configured as stream nodes?