Question
JPMC
US
Last activity: 3 Jul 2017 8:10 EDT
Server startup taking 1 hour with 30 minutes spent during "Starts Initializing Search Infrastructure" step
Hi All - I recently synced up one environment(single node) by copying data from other environment(multinode). now server startup is taking 1 hour (on high level 30 minutes: during Starts Initializing Search Infrastructure and 6 minutes: during checking absolete triggers ). we are using v719, oracle 12c & weblogic1036. Earlier before restore it use to take just at max 10 minutes to come up i.e JVM settings didnt change. I deleted all system nodes records from pr_data_admin and also I dont have any node specified under System--> Settings-->search screen for elastic search as os now in the target environment.
Can someone help me resolve this issue or let me know the area to ivestigate.
-
Like (0)
-
Share this page Facebook Twitter LinkedIn Email Copying... Copied!
Accepted Solution
JPMC
US
it was due to missing indexes on tables after restore.
Virtusa IT Consulting
AE
What's the PRPC version? Do you see any thread dumps generated in logs during server statup?
Pegasystems
IN
If you have cloned the environment and the other environment was using full text search, then you should truncate pr_sys_statusnodes table before starting the weblogic nodes on the new setup. Otherwise, the current cluster will try to connect to the other cluster's ElasticSearch leading to index corruption and start up time overhead in the current cluster.
This is important for both the clustering technologies used in the Pega platform (Hazelcast and ElasticSearch).
I suggest you try this option and test.
JPMC
US
Hi Rajeev - Thank you for your response! I tried truncating pr_sys_statusnodes and also deleting extract marker file and now it took around 1:20 hrs(20 minutes more b/c of deletion of extract marker), meaning it didnt help.
we are on v7.1.9 and I dont see any thread dumps in the log
Pegasystems
IN
I am not sure why you deleted the extract marker file. Did you try a restart again to see if this time it is faster (without making any changes)?
Would it be possible to share the following
- Dump of the pr_sys_statusnodes table
- The full PegaRULES log file
- The values of the following Data-Admin-System-Settings instances
- indexing/distributed/enabled
- indexing/distributed/index_enabled
- indexing/distributed/search_enabled
JPMC
US
Hi Rajiv - Please find below the requested information:
1. Dump of the pr_sys_statusnodes table - it currently just has one row. I have added that row in the log file itself.
2. I have attched full nohup.out rather. Pls let me know if you also need PegaRULES logs file.
3. All 3 DSS settings have value as true
Pegasystems
IN
Hi,
I took a look at the pr_sys_statusnodes entry and found that the pyClusterAddress is " Proprietary information hidden:50032" and pyIndexerAddress is " Proprietary information hidden:9301". It does look like for the same node entry there are two different IP addresses. Does this machine have two network cards? If yes, which IP address should we be using? If the pyClusterAddress is correct, then the pyIndexerAddress needs to be fixed. This can be done by overriding the prconfig entry. You can find the details here - https://community.pega.com/support/support-articles/search-returns-empty-results.
-Rajiv
JPMC
US
Hi baigh - Today I do see some thread dumps in the logs(when trying to save as a rule) which wait on search threads, not sure if it is related to the issue with server startup which take more than 30 minutes on "Starts Initializing Search Infrastructure" step during startup:
***************************************************************************************************************
Hi baigh - Today I do see some thread dumps in the logs(when trying to save as a rule) which wait on search threads, not sure if it is related to the issue with server startup which take more than 30 minutes on "Starts Initializing Search Infrastructure" step during startup:
***************************************************************************************************************
2016-09-06 12:37:07,042 [fault (self-tuning)'] [ ] [ ] (.timers.EnvironmentDiagnostics) INFO MYSERV1|10.1881.611.111 - --- Thread Dump Starts ---
Full Java thread dump with locks info
"PegaRULES-Search[search][T#1]" Id=221 in WAITING on lock=com.pega.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue@45938b63
BlockedCount : 0, BlockedTime : -1, WaitedCount : 2, WaitedTime : -1
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at com.pega.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.awaitMatch(LinkedTransferQueue.java:706)
at com.pega.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.xfer(LinkedTransferQueue.java:615)
at com.pega.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.take(LinkedTransferQueue.java:1109)
at com.pega.elasticsearch.common.util.concurrent.SizeBlockingQueue.take(SizeBlockingQueue.java:162)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Locked synchronizers: count = 0
"PegaRULES-Search[generic][T#12]" Id=220 in TIMED_WAITING on lock=java.util.concurrent.SynchronousQueue$TransferStack@74af3666
BlockedCount : 0, BlockedTime : -1, WaitedCount : 221, WaitedTime : -1
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:460)
at java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:359)
at java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:942)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Locked synchronizers: count = 0
"OracleTimeoutPollingThread" Id=219 in TIMED_WAITING
BlockedCount : 0, BlockedTime : -1, WaitedCount : 11659, WaitedTime : -1
at java.lang.Thread.sleep(Native Method)
at oracle.jdbc.driver.OracleTimeoutPollingThread.run(OracleTimeoutPollingThread.java:150)
Locked synchronizers: count = 0
Accepted Solution
JPMC
US
it was due to missing indexes on tables after restore.
hcl
IN
Hi DP Singh,
Can you elaborate on which table the indexes were missing. Did you infer that from thread dumps?
Thanks
Suraj