problem starting jboss on pega server

Question

FastwebPegaS

Member since 2018

3 posts

Fastweb SPA

Posted: Aug 2, 2022

Last activity: Oct 18, 2022

Posted: 2 Aug 2022 4:56 EDT
Last activity: 18 Oct 2022 14:03 EDT

Solved

problem starting jboss on pega server

Report

Hi, we have a problem starting the 3th instance of JBOSS on application server PEGA, the error ocurred is :

19:06:44,487 WARNING [com.hazelcast.spi.impl.BasicInvocation] (hz._hzInstance_1_668e03f97c14a138d2c4267ce8ce03b2.response) [xx.xx.xx.xxx]:5702 [668e03f97c14a138d2c4267ce8ce03b2] [3.2] Retrying invocation: BasicInvocation{ serviceName='hz:impl:mapService', op=com.hazelcast.map.operation.MapKeySetOperation@7c9fb2eb, partitionId=130, replicaIndex=0, tryCount=250, tryPauseMillis=500, invokeCount=240, callTimeout=60000, target=Address[xx.xx.xx.xxx]:5701}, Reason: com.hazelcast.spi.exception.PartitionMigrating
Exception: Partition is migrating! this:Address[xx.xx.xx.xxx]:5701, partitionId: 130, operation: com.hazelcast.map.operation.MapKeySetOperation, service: hz:impl:mapService
19:06:49,531 STDERR [stderr] (asxxxxxx.intranet.fw) com.hazelcast.spi.exception.PartitionMigratingException: Partition is migrating! this:Address[xx.xx.xx.xxx]:5701, partitionId: 130, operation: com.hazelcast.map.operation.MapKeySetOperation, service: hz:impl:mapService
19:06:49,531 STDERR [stderr] (asxxxxxx.intranet.fw)     at com.hazelcast.spi.impl.BasicOperationService.processOperation(BasicOperationService.java:344)
19:06:49,531 STDERR [stderr] (asxxxxxx.intranet.fw)     at com.hazelcast.spi.impl.BasicOperationService.processPacket(BasicOperationService.java:309)
19:06:49,532 STDERR [stderr] (asxxxxxx.intranet.fw)     at com.hazelcast.spi.impl.BasicOperationService.access$400(BasicOperationService.java:102)
19:06:49,532 STDERR [stderr] (asxxxxxx.intranet.fw)     at com.hazelcast.spi.impl.BasicOperationService$BasicOperationProcessorImpl.process(BasicOperationService.java:756)
19:06:49,532 STDERR [stderr] (asxxxxxx.intranet.fw)     at com.hazelcast.spi.impl.BasicOperationScheduler$PartitionThread.process(BasicOperationScheduler.java:276)
19:06:49,532 STDERR [stderr] (asxxxxxx.intranet.fw)     at com.hazelcast.spi.impl.BasicOperationScheduler$PartitionThread.doRun(BasicOperationScheduler.java:270)
19:06:49,533 STDERR [stderr] (asxxxxxx.intranet.fw)     at com.hazelcast.spi.impl.BasicOperationScheduler$PartitionThread.run(BasicOperationScheduler.java:245)
19:06:49,533 STDERR [stderr] (asxxxxxx.intranet.fw)     at ------ End remote and begin local stack-trace ------.(Unknown Source)
19:06:49,533 STDERR [stderr] (asxxxxxx.intranet.fw)     at com.hazelcast.spi.impl.BasicInvocation$InvocationFuture.resolveResponse(BasicInvocation.java:836)
19:06:49,534 STDERR [stderr] (asxxxxxx.intranet.fw)     at com.hazelcast.spi.impl.BasicInvocation$InvocationFuture.resolveResponseOrThrowException(BasicInvocation.java:769)
19:06:49,534 STDERR [stderr] (asxxxxxx.intranet.fw)     at com.hazelcast.spi.impl.BasicInvocation$InvocationFuture.get(BasicInvocation.java:696)
19:06:49,534 STDERR [stderr] (asxxxxxx.intranet.fw)     at com.hazelcast.spi.impl.BasicInvocation$InvocationFuture.get(BasicInvocation.java:674)
19:06:49,534 STDERR [stderr] (asxxxxxx.intranet.fw)     at com.hazelcast.spi.impl.BasicOperationService.invokeOnPartitions(BasicOperationService.java:613)
19:06:49,535 STDERR [stderr] (asxxxxxx.intranet.fw)     at com.hazelcast.spi.impl.BasicOperationService.invokeOnAllPartitions(BasicOperationService.java:549)
19:06:49,535 STDERR [stderr] (asxxxxxx.intranet.fw)     at com.hazelcast.map.proxy.MapProxySupport.keySetInternal(MapProxySupport.java:573)
19:06:49,535 STDERR [stderr] (asxxxxxx.intranet.fw)     at com.hazelcast.map.proxy.MapProxyImpl.keySet(MapProxyImpl.java:479)
19:06:49,535 STDERR [stderr] (asxxxxxx.intranet.fw)     at com.pega.pegarules.cluster.internal.PRClusterHazelcastImpl.checkMembershipConsistency(PRClusterHazelcastImpl.java:418)
19:06:49,536 STDERR [stderr] (asxxxxxx.intranet.fw)     at com.pega.pegarules.session.internal.mgmt.PRNodeImpl.checkClusterConsistency(PRNodeImpl.java:2397)
19:06:49,536 STDERR [stderr] (asxxxxxx.intranet.fw)     at com.pega.pegarules.session.internal.mgmt.PREnvironment.getThreadAndInitialize(PREnvironment.java:374)
19:06:49,536 STDERR [stderr] (asxxxxxx.intranet.fw)     at com.pega.pegarules.session.internal.PRSessionProviderImpl.getThreadAndInitialize(PRSessionProviderImpl.java:1905)
19:06:49,536 STDERR [stderr] (asxxxxxx.intranet.fw)     at com.pega.pegarules.session.internal.engineinterface.etier.impl.EngineStartup.initEngine(EngineStartup.java:657)
19:06:49,537 STDERR [stderr] (asxxxxxx.intranet.fw)     at com.pega.pegarules.session.internal.engineinterface.etier.impl.EngineImpl._initEngine_privact(EngineImpl.java:165)
19:06:49,537 STDERR [stderr] (asxxxxxx.intranet.fw)     at com.pega.pegarules.session.internal.engineinterface.etier.impl.EngineImpl.doStartup(EngineImpl.java:138)
19:06:49,537 STDERR [stderr] (asxxxxxx.intranet.fw)     at com.pega.pegarules.web.servlet.WebAppLifeCycleListener._contextInitialized_privact(WebAppLifeCycleListener.java:280)
19:06:49,538 STDERR [stderr] (asxxxxxx.intranet.fw)     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
19:06:49,539 STDERR [stderr] (asxxxxxx.intranet.fw)     at java.lang.reflect.Method.invoke(Method.java:606)
19:06:49,539 STDERR [stderr] (asxxxxxx.intranet.fw)     at com.pega.pegarules.internal.bootstrap.PRBootstrap.invokeMethod(PRBootstrap.java:338)
19:06:49,539 STDERR [stderr] (asxxxxxx.intranet.fw)     at com.pega.pegarules.internal.bootstrap.PRBootstrap.invokeMethodPropagatingThrowable(PRBootstrap.java:379)
19:06:49,539 STDERR [stderr] (asxxxxxx.intranet.fw)     at com.pega.pegarules.boot.internal.extbridge.AppServerBridgeToPega.invokeMethodPropagatingThrowable(AppServerBridgeToPega.java:216)
19:06:49,540 STDERR [stderr] (asxxxxxx.intranet.fw)     at com.pega.pegarules.boot.internal.extbridge.AppServerBridgeToPega.invokeMethod(AppServerBridgeToPega.java:265)
19:06:49,540 STDERR [stderr] (asxxxxxx.intranet.fw)     at com.pega.pegarules.internal.web.servlet.WebAppLifeCycleListenerBoot.contextInitialized(WebAppLifeCycleListenerBoot.java:83)
19:06:49,540 STDERR [stderr] (asxxxxxx.intranet.fw)     at org.apache.catalina.core.StandardContext.contextListenerStart(StandardContext.java:3339)
19:06:49,540 STDERR [stderr] (asxxxxxx.intranet.fw)     at org.apache.catalina.core.StandardContext.start(StandardContext.java:3780)
19:06:49,541 STDERR [stderr] (asxxxxxx.intranet.fw)     at org.jboss.as.web.deployment.WebDeploymentService.doStart(WebDeploymentService.java:163)
19:06:49,541 STDERR [stderr] (asxxxxxx.intranet.fw)     at org.jboss.as.web.deployment.WebDeploymentService.access$000(WebDeploymentService.java:61)
19:06:49,541 STDERR [stderr] (asxxxxxx.intranet.fw)     at org.jboss.as.web.deployment.WebDeploymentService$1.run(WebDeploymentService.java:96)
19:06:49,542 STDERR [stderr] (asxxxxxx.intranet.fw)     at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
19:06:49,542 STDERR [stderr] (asxxxxxx.intranet.fw)     at java.util.concurrent.FutureTask.run(FutureTask.java:262)
19:06:49,542 STDERR [stderr] (asxxxxxx.intranet.fw)     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
19:06:49,542 STDERR [stderr] (asxxxxxx.intranet.fw)     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
19:06:49,543 STDERR [stderr] (asxxxxxx.intranet.fw)     at java.lang.Thread.run(Thread.java:745)
19:06:49,543 STDERR [stderr] (asxxxxxx.intranet.fw)     at org.jboss.threads.JBossThread.run(JBossThread.java:122)
19:06:49,762 INFO [org.jboss.as.server] (Controller Boot Thread) JBAS015859: Deployed "SFDCGATEWAY.war" (runtime-name : "SFDCGATEWAY.war")
19:06:49,762 INFO [org.jboss.as.server] (Controller Boot Thread) JBAS015859: Deployed "prgateway.war" (runtime-name : "prgateway.war")
19:06:49,762 INFO [org.jboss.as.server] (ServerService Thread Pool -- 34) JBAS015859: Deployed "prweb.war" (runtime-name : "prweb.war")
19:06:49,763 INFO [org.jboss.as.server] (ServerService Thread Pool -- 34) JBAS015859: Deployed "prhelp.war" (runtime-name : "prhelp.war")
19:06:49,772 INFO [org.jboss.as] (Controller Boot Thread) JBAS015961: Http management interface listening on http://xx.xx.xx.xxx:9990/management
19:06:49,772 INFO [org.jboss.as] (Controller Boot Thread) JBAS015951: Admin console listening on http://xx.xx.xx.xxx:9990
19:06:49,773 INFO [org.jboss.as] (Controller Boot Thread) JBAS015874: JBoss EAP 6.4.21.GA (AS 7.5.21.Final-redhat-1) started in 351711ms - Started 533 of 559 services (90 services are lazy, passive or on-demand)
19:15:51,622 INFO [com.hazelcast.nio.SocketAcceptor] (hz._hzInstance_1_668e03f97c14a138d2c4267ce8ce03b2.IO.thread-Acceptor) [xx.xx.xx.xxx]:5702 [668e03f97c14a138d2c4267ce8ce03b2] [3.2] Accepting socket connection from /xx.xx.xx.xxx:40515
19:15:51,624 INFO [com.hazelcast.nio.TcpIpConnectionManager] (hz._hzInstance_1_668e03f97c14a138d2c4267ce8ce03b2.IO.thread-Acceptor) [xx.xx.xx.xxx]:5702 [668e03f97c14a138d2c4267ce8ce03b2] [3.2] 5702 accepted socket connection from /xx.xx.xx.xxx:40515
19:17:51,954 INFO [com.hazelcast.nio.TcpIpConnection] (hz._hzInstance_1_668e03f97c14a138d2c4267ce8ce03b2.IO.thread-in-2) [xx.xx.xx.xxx]:5702 [668e03f97c14a138d2c4267ce8ce03b2] [3.2] Connection [/xx.xx.xx.xxx:40515] lost. Reason: java.io.EOFException[Remote socket closed!]

Show Less

***Edited by Moderator Marije to change Type from Discussion to Question***

***Edited by Moderator Marije to add Support Case Details***

To see attachments, please log in.

Pega Platform 7.1.7

System Administration

Installation and Deployment

Data Model

Solutions Consultant

Support Case Exists

Reply
Like (0)
Share this page Facebook Twitter LinkedIn Email Copying... Copied!

Accepted Solution

Posted: 3 years ago

Updated: 3 years ago

Posted: 31 Aug 2022 9:15 EDT
Updated: 18 Oct 2022 14:03 EDT

MarijeSchillern

MOD

replied to FastwebPegaS

Report

@FastwebPegaS this issue cannot be processed any further by our support team due to lack of response.

The analysis of your issue is as follows

The logs show the below errors/exceptions :

Caused by: com.hazelcast.spi.exception.PartitionMigratingException: Partition is migrating! this:Address[WIPED DATA]:5701, partitionId: 3, operation: com.hazelcast.map.operation.MapKeySetOperation, service: hz:impl:mapService

Retrying invocation: BasicInvocation{ serviceName='hz:impl:mapService', op=GetOperation{}, partitionId=45, replicaIndex=0, tryCount=250, tryPauseMillis=500, invokeCount=110, callTimeout=60000, target=null}, Reason: com.hazelcast.spi.exception.WrongTargetException: WrongTarget! this:Address[[WIPED DATA]:5702, target:null,

partitionId: 45, replicaIndex: 0, operation: com.hazelcast.map.operation.GetOperation, service: hz:impl:mapService

@FastwebPegaS this issue cannot be processed any further by our support team due to lack of response.

The analysis of your issue is as follows

The logs show the below errors/exceptions :

partitionId: 45, replicaIndex: 0, operation: com.hazelcast.map.operation.GetOperation, service: hz:impl:mapService

Based on the above stack trace it looks like target is null and partition having issue , This happens Specifically, when the target is null, this message means that this particular member doesn’t have the owner set for a specific partition. This means that the member didn't get its partition table updated in time (a request was made before we were informed where the data lives in the grid).

What it means: In a healthy cluster, this should rarely occur as Hazelcast has delivered fixed in past releases which prevent the race condition between looking for data and getting the updated partition table information. In a split-brain situation when the cluster is fractured into many smaller clusters, partitions are lost (since some partitions may only have existed on nodes that are no longer part of a splintered group of nodes).

It’s also the case that frequent fracturing and merging causes the partition tables to experience delayed updates.

What to do:

In a healthy cluster, this should be a one-off and can safely be ignored. When the error is seen multiple times, it may indicate that the cluster is experiencing fracturing. Also check to make sure that there are no ports blocked by your firewall.

You can try the below steps to fix this issue for now and check if it helps:

Take the DB backup.
Bring all nodes down
Truncate the pr_sys_statusnodes table (Take first DB backup)
Bring up one node (preferably an index host node)
Bring up the remainder of the nodes in parallel or in series as you wish

Note: It is recommended to perform the above steps in non-business hours after taking the DB backup.

If you still see the issue after performing the above then support has asked you to provide below details:

Cluster logs and pega logs
Prconfig details
Hfix scan reports.

Show Less

View reply inline

To see attachments, please log in.

Posted: 3 years ago

Updated: 3 years ago

Posted: 2 Aug 2022 4:57 EDT
Updated: 2 Aug 2022 5:02 EDT

MarijeSchillern

MOD

replied to FastwebPegaS

Report

@FastwebPegaS please can you confirm that you logged INC-235356 support ticket for this? This will help the moderators track the issue and follow it to conclusion.

This type of error can be seen during search initiation. It is a known issue with sharding management during rolling restarts. This may have nothing to do with the startup except that search is unlikely to operate in this environment. Shutting down all nodes, emptying the index directory, and reindexing should resolve the search initialization issue. If the problem still appears to be a search issue, please share the thread dumps from the node startup with our support team.

The best course of action is to wait for our support team to analyse your logs and help you further.

To see attachments, please log in.

Accepted Solution

Posted: 3 years ago

Updated: 3 years ago

Posted: 31 Aug 2022 9:15 EDT
Updated: 18 Oct 2022 14:03 EDT

MarijeSchillern

MOD

replied to FastwebPegaS

Report

@FastwebPegaS this issue cannot be processed any further by our support team due to lack of response.

The analysis of your issue is as follows

The logs show the below errors/exceptions :

partitionId: 45, replicaIndex: 0, operation: com.hazelcast.map.operation.GetOperation, service: hz:impl:mapService

@FastwebPegaS this issue cannot be processed any further by our support team due to lack of response.

The analysis of your issue is as follows

The logs show the below errors/exceptions :

partitionId: 45, replicaIndex: 0, operation: com.hazelcast.map.operation.GetOperation, service: hz:impl:mapService

Based on the above stack trace it looks like target is null and partition having issue , This happens Specifically, when the target is null, this message means that this particular member doesn’t have the owner set for a specific partition. This means that the member didn't get its partition table updated in time (a request was made before we were informed where the data lives in the grid).

It’s also the case that frequent fracturing and merging causes the partition tables to experience delayed updates.

What to do:

You can try the below steps to fix this issue for now and check if it helps:

Take the DB backup.
Bring all nodes down
Truncate the pr_sys_statusnodes table (Take first DB backup)
Bring up one node (preferably an index host node)
Bring up the remainder of the nodes in parallel or in series as you wish

Note: It is recommended to perform the above steps in non-business hours after taking the DB backup.

If you still see the issue after performing the above then support has asked you to provide below details:

Cluster logs and pega logs
Prconfig details
Hfix scan reports.

Show Less

To see attachments, please log in.

Question

problem starting jboss on pega server

Need help or want to help others?

Experience the benefits of Support Center when you log in.

Question

problem starting jboss on pega server

Related content:

Need help or want to help others?

Experience the benefits of Support Center when you log in.

We'd prefer it if you saw us at our best.