Question
Accenture
JP
Last activity: 8 Feb 2023 1:02 EST
How to create multi-server systems using AWS AutoScalingGroups
Hi,
We would like to create multi-server systems using AWS AutoScalingGroups. So we tried to create ami (machine images) with Pega and create two servers that have the same data (ami). Then, we faced the following error.(*) We raised an incident with Pega INC-257543 and the error was temporarily resolved. There are 4 steps to the troubleshooting process. 1. Bring all NODES down. 2. Truncate the Stream service tables in the Pega Platform data schema: truncate table pr_data_stream_nodes truncate table pr_data_stream_sessions truncate table pr_data_stream_node_updates 3. Delete Kafka-data directory from the Stream instances. 4. Bring the nodes back up one instance at a time. However, this solution is not suitable for production environments, because the error occurs every time we restore the servers and we have to truncate the all Stream service tables. Please let me know if anyone knows what needs to be configured or considered when using AWS AutoScalingGroups.
Hi,
We would like to create multi-server systems using AWS AutoScalingGroups. So we tried to create ami (machine images) with Pega and create two servers that have the same data (ami). Then, we faced the following error.(*) We raised an incident with Pega INC-257543 and the error was temporarily resolved. There are 4 steps to the troubleshooting process. 1. Bring all NODES down. 2. Truncate the Stream service tables in the Pega Platform data schema: truncate table pr_data_stream_nodes truncate table pr_data_stream_sessions truncate table pr_data_stream_node_updates 3. Delete Kafka-data directory from the Stream instances. 4. Bring the nodes back up one instance at a time. However, this solution is not suitable for production environments, because the error occurs every time we restore the servers and we have to truncate the all Stream service tables. Please let me know if anyone knows what needs to be configured or considered when using AWS AutoScalingGroups.
(*) Error 2023-01-23 14:32:32,843 [StreamServer.Default] [ STANDARD] [ ] [ ] ( dsm.kafka.Kafka) ERROR - Failed to start Kafka on 1 attempt, kafka log [2023-01-23 14:27:37,410] ERROR Error while creating ephemeral at /brokers/ids/1001, node already exists and owner '37679265661976577' does not match current session '-2949119' (kafka.zk.KafkaZkClient$CheckedEphemeral)[2023-01-23 14:27:37,410] ERROR [KafkaServer id=1001] Fatal error during KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer)org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = NodeExists at org.apache.zookeeper.KeeperException.create(KeeperException.java:119) ~[zookeeper-3.4.10.jar:3.4.10-39d3a4f269333c922ed3db283be479f9deacaa0f] at kafka.zk.KafkaZkClient.checkedEphemeralCreate(KafkaZkClient.scala:1476) ~[kafka_2.11- Proprietary information hidden.jar:?] at kafka.zk.KafkaZkClient.registerBrokerInZk(KafkaZkClient.scala:84) ~[kafka_2.11- Proprietary information hidden.jar:?] at kafka.server.KafkaServer.startup(KafkaServer.scala:254) [kafka_2.11- Proprietary information hidden.jar:?] at kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:38) [kafka_2.11- Proprietary information hidden.jar:?] at kafka.Kafka$.main(Kafka.scala:75) [kafka_2.11- Proprietary information hidden.jar:?] at kafka.Kafka.main(Kafka.scala) [kafka_2.11- Proprietary information hidden.jar:?][2023-01-23 14:27:40,653] ERROR Exiting Kafka. (kafka.server.KafkaServerStartable) 2023-01-23 14:37:37,237 [StreamServer.Default] [ STANDARD] [ ] [ ] (rvice.operation.StartOperation) ERROR - Cannot start service [StreamServer.Default]. Will retry in 180 seconds. Remaining attempts: 2 com.pega.dsm.dnode.api.StreamServiceException: Unable to start Kafka broker. Last state was: NotConnected [2023-01-23 14:32:38,865] ERROR Error while creating ephemeral at /brokers/ids/1001, node already exists and owner '37679265661976577' does not match current session '-2949118' (kafka.zk.KafkaZkClient$CheckedEphemeral)[2023-01-23 14:32:38,865] ERROR [KafkaServer id=1001] Fatal error during KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer)org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = NodeExists at org.apache.zookeeper.KeeperException.create(KeeperException.java:119) ~[zookeeper-3.4.10.jar:3.4.10-39d3a4f269333c922ed3db283be479f9deacaa0f] at kafka.zk.KafkaZkClient.checkedEphemeralCreate(KafkaZkClient.scala:1476) ~[kafka_2.11- Proprietary information hidden.jar:?] at kafka.zk.KafkaZkClient.registerBrokerInZk(KafkaZkClient.scala:84) ~[kafka_2.11- Proprietary information hidden.jar:?] at kafka.server.KafkaServer.startup(KafkaServer.scala:254) [kafka_2.11- Proprietary information hidden.jar:?] at kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:38) [kafka_2.11- Proprietary information hidden.jar:?] at kafka.Kafka$.main(Kafka.scala:75) [kafka_2.11- Proprietary information hidden.jar:?] at kafka.Kafka.main(Kafka.scala) [kafka_2.11- Proprietary information hidden.jar:?][2023-01-23 14:32:42,091] ERROR Exiting Kafka. (kafka.server.KafkaServerStartable) at com.pega.dsm.kafka.Kafka.bootstrap(Kafka.java:273) ~[d-node.jar:?] at com.pega.dsm.dnode.api.server.StreamServerService$StreamServiceStartOperation$2.emit(StreamServerService.java:667) ~[d-node.jar:?] at com.pega.dsm.dnode.impl.stream.DataObservableImpl$SafeDataSubscriber.subscribe(DataObservableImpl.java:353) ~[d-node.jar:?] at com.pega.dsm.dnode.impl.stream.DataObservableImpl.subscribe(DataObservableImpl.java:55) ~[d-node.jar:?] at com.pega.dsm.dnode.impl.stream.DataObservableImpl.await(DataObservableImpl.java:117) ~[d-node.jar:?] at com.pega.dsm.dnode.impl.stream.DataObservableImpl.await(DataObservableImpl.java:106) ~[d-node.jar:?] at com.pega.dsm.dnode.api.prpc.service.operation.StartOperation$1.execute(StartOperation.java:167) ~[d-node.jar:?] at com.pega.dsm.dnode.util.OperationWithLock$LockingOperation.couldAcquireLock(OperationWithLock.java:190) ~[d-node.jar:?] at com.pega.dsm.dnode.util.OperationWithLock$LockingOperation.performLockOperation(OperationWithLock.java:157) ~[d-node.jar:?] at com.pega.dsm.dnode.util.OperationWithLock$LockingOperation.access$200(OperationWithLock.java:102) ~[d-node.jar:?] at com.pega.dsm.dnode.util.OperationWithLock.doWithLock(OperationWithLock.java:99) ~[d-node.jar:?] at com.pega.dsm.dnode.util.OperationWithLock.doWithLock(OperationWithLock.java:95) ~[d-node.jar:?] at com.pega.dsm.dnode.impl.prpc.service.ServiceHelper.executeWithLockInternal(ServiceHelper.java:273) ~[d-node.jar:?] at com.pega.dsm.dnode.impl.prpc.service.ServiceHelper.executeWithLock(ServiceHelper.java:221) ~[d-node.jar:?] at com.pega.dsm.dnode.api.prpc.service.operation.StartOperation.doActualServerStart(StartOperation.java:164) ~[d-node.jar:?] at com.pega.dsm.dnode.api.prpc.service.operation.StartOperation.performStartupWithRetries(StartOperation.java:137) ~[d-node.jar:?] at com.pega.dsm.dnode.api.prpc.service.operation.StartOperation.initializeServerMode(StartOperation.java:117) ~[d-node.jar:?] at com.pega.dsm.dnode.api.prpc.service.operation.StartOperation.lambda$bootstrap$0(StartOperation.java:85) ~[d-node.jar:?] at com.pega.dsm.dnode.impl.stream.DataObservableImpl$SafeDataSubscriber.subscribe(DataObservableImpl.java:353) ~[d-node.jar:?] at com.pega.dsm.dnode.impl.stream.DataObservableImpl.subscribe(DataObservableImpl.java:55) ~[d-node.jar:?] at com.pega.dsm.dnode.impl.stream.DataObservableImpl.await(DataObservableImpl.java:117) ~[d-node.jar:?] at com.pega.dsm.dnode.impl.stream.DataObservableImpl.await(DataObservableImpl.java:106) ~[d-node.jar:?] at com.pega.dsm.dnode.impl.prpc.service.ServiceDefinition.startService(ServiceDefinition.java:81) ~[d-node.jar:?] at com.pega.dsm.dnode.impl.prpc.service.ServiceDefinition.start(ServiceDefinition.java:66) ~[d-node.jar:?] at com.pega.dsm.dnode.api.prpc.service.ServiceManager$4.run(ServiceManager.java:429) ~[d-node.jar:?] at com.pega.dsm.dnode.util.PrpcRunnable.execute(PrpcRunnable.java:67) ~[d-node.jar:?] at com.pega.dsm.dnode.impl.prpc.service.ServiceHelper.executeInPrpcContextInternal(ServiceHelper.java:305) ~[d-node.jar:?] at com.pega.dsm.dnode.impl.prpc.service.ServiceHelper.executeInPrpcContext(ServiceHelper.java:150) ~[d-node.jar:?] at com.pega.dsm.dnode.api.prpc.service.ServiceManager.startServiceDefinition(ServiceManager.java:426) ~[d-node.jar:?] at com.pega.dsm.dnode.api.prpc.service.ServiceManager.lambda$bootstrap$3(ServiceManager.java:388) ~[d-node.jar:?] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) ~[?:?] at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108) ~[guava-19.0.jar:?] at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41) ~[guava-19.0.jar:?] at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77) ~[guava-19.0.jar:?] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?] at com.pega.dsm.dnode.util.PrpcRunnable$1.run(PrpcRunnable.java:59) ~[d-node.jar:?] at com.pega.dsm.dnode.util.PrpcRunnable$1.run(PrpcRunnable.java:56) ~[d-node.jar:?] at com.pega.dsm.dnode.util.PrpcRunnable.execute(PrpcRunnable.java:67) ~[d-node.jar:?] at com.pega.dsm.dnode.impl.prpc.PrpcThreadFactory$PrpcThread.run(PrpcThreadFactory.java:124) ~[d-node.jar:?]