Server startup taking 1 hour with 30 minutes spent during "Starts Initializing Search Infrastructure" step

Question

DPSSingh

Member since 2009

53 posts

JPMC

Posted: Aug 31, 2016

Last activity: Jul 3, 2017

Posted: 31 Aug 2016 21:31 EDT
Last activity: 3 Jul 2017 8:10 EDT

Closed

Solved

Server startup taking 1 hour with 30 minutes spent during "Starts Initializing Search Infrastructure" step

Report

Hi All - I recently synced up one environment(single node) by copying data from other environment(multinode). now server startup is taking 1 hour (on high level 30 minutes: during Starts Initializing Search Infrastructure and 6 minutes: during checking absolete triggers ). we are using v719, oracle 12c & weblogic1036. Earlier before restore it use to take just at max 10 minutes to come up i.e JVM settings didnt change. I deleted all system nodes records from pr_data_admin and also I dont have any node specified under System--> Settings-->search screen for elastic search as os now in the target environment.

Can someone help me resolve this issue or let me know the area to ivestigate.

To see attachments, please log in.

Data Integration

System Administration

Like (0)
Share this page Facebook Twitter LinkedIn Email Copying... Copied!

Accepted Solution

Posted: 8 years ago

Posted: 8 Sep 2016 19:51 EDT

DPSSingh

JPMC

replied to DPSSingh

Report

it was due to missing indexes on tables after restore.

View reply inline

To see attachments, please log in.

Posted: 8 years ago

Posted: 1 Sep 2016 2:32 EDT

BaigHabeeb

Virtusa IT Consulting

replied to DPSSingh

Report

What's the PRPC version? Do you see any thread dumps generated in logs during server statup?

To see attachments, please log in.

Like (0)

Posted: 8 years ago

Posted: 1 Sep 2016 2:51 EDT

nistr replied to DPSSingh

Report

If you have cloned the environment and the other environment was using full text search, then you should truncate pr_sys_statusnodes table before starting the weblogic nodes on the new setup. Otherwise, the current cluster will try to connect to the other cluster's ElasticSearch leading to index corruption and start up time overhead in the current cluster.

This is important for both the clustering technologies used in the Pega platform (Hazelcast and ElasticSearch).

I suggest you try this option and test.

To see attachments, please log in.

Like (0)

Posted: 8 years ago

Posted: 1 Sep 2016 16:31 EDT

DPSSingh

JPMC

replied to DPSSingh

Report

Hi Rajeev - Thank you for your response! I tried truncating pr_sys_statusnodes and also deleting extract marker file and now it took around 1:20 hrs(20 minutes more b/c of deletion of extract marker), meaning it didnt help.

we are on v7.1.9 and I dont see any thread dumps in the log

To see attachments, please log in.

Like (0)

Posted: 8 years ago

Posted: 2 Sep 2016 3:00 EDT

nistr replied to DPSSingh

Report

I am not sure why you deleted the extract marker file. Did you try a restart again to see if this time it is faster (without making any changes)?

Would it be possible to share the following

Dump of the pr_sys_statusnodes table
The full PegaRULES log file
The values of the following Data-Admin-System-Settings instances

indexing/distributed/enabled
indexing/distributed/index_enabled
indexing/distributed/search_enabled

To see attachments, please log in.

Like (0)

Posted: 8 years ago

Posted: 8 Sep 2016 19:50 EDT

DPSSingh

JPMC

replied to nistr

Report

Hi Rajiv - Please find below the requested information:

1. Dump of the pr_sys_statusnodes table - it currently just has one row. I have added that row in the log file itself.

2. I have attched full nohup.out rather. Pls let me know if you also need PegaRULES logs file.

3. All 3 DSS settings have value as true

To see attachments, please log in.

Like (0)

Posted: 8 years ago

Posted: 7 Sep 2016 13:05 EDT

nistr replied to DPSSingh

Report

Hi,

I took a look at the pr_sys_statusnodes entry and found that the pyClusterAddress is " Proprietary information hidden:50032" and pyIndexerAddress is " Proprietary information hidden:9301". It does look like for the same node entry there are two different IP addresses. Does this machine have two network cards? If yes, which IP address should we be using? If the pyClusterAddress is correct, then the pyIndexerAddress needs to be fixed. This can be done by overriding the prconfig entry. You can find the details here - https://community.pega.com/support/support-articles/search-returns-empty-results.

-Rajiv

To see attachments, please log in.

Like (0)

Posted: 8 years ago

Posted: 6 Sep 2016 12:57 EDT

DPSSingh

JPMC

replied to DPSSingh

Report

Hi baigh - Today I do see some thread dumps in the logs(when trying to save as a rule) which wait on search threads, not sure if it is related to the issue with server startup which take more than 30 minutes on "Starts Initializing Search Infrastructure" step during startup:

***************************************************************************************************************

2016-09-06 12:37:07,042 [fault (self-tuning)'] [          ] [                    ] (.timers.EnvironmentDiagnostics) INFO MYSERV1|10.1881.611.111 - --- Thread Dump Starts ---
Full Java thread dump with locks info
"PegaRULES-Search[search][T#1]" Id=221 in WAITING on lock=com.pega.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue@45938b63
BlockedCount : 0, BlockedTime : -1, WaitedCount : 2, WaitedTime : -1
    at sun.misc.Unsafe.park(Native Method)
    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
    at com.pega.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.awaitMatch(LinkedTransferQueue.java:706)
    at com.pega.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.xfer(LinkedTransferQueue.java:615)
    at com.pega.elasticsearch.common.util.concurrent.jsr166y.LinkedTransferQueue.take(LinkedTransferQueue.java:1109)
    at com.pega.elasticsearch.common.util.concurrent.SizeBlockingQueue.take(SizeBlockingQueue.java:162)
    at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

    Locked synchronizers: count = 0

"PegaRULES-Search[generic][T#12]" Id=220 in TIMED_WAITING on lock=java.util.concurrent.SynchronousQueue$TransferStack@74af3666
BlockedCount : 0, BlockedTime : -1, WaitedCount : 221, WaitedTime : -1
    at sun.misc.Unsafe.park(Native Method)
    at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
    at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:460)
    at java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:359)
    at java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:942)
    at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

    Locked synchronizers: count = 0

"OracleTimeoutPollingThread" Id=219 in TIMED_WAITING
BlockedCount : 0, BlockedTime : -1, WaitedCount : 11659, WaitedTime : -1
    at java.lang.Thread.sleep(Native Method)
    at oracle.jdbc.driver.OracleTimeoutPollingThread.run(OracleTimeoutPollingThread.java:150)

    Locked synchronizers: count = 0

Show Less

To see attachments, please log in.

Like (0)

Accepted Solution

Posted: 8 years ago

Posted: 8 Sep 2016 19:51 EDT

DPSSingh

JPMC

replied to DPSSingh

Report

it was due to missing indexes on tables after restore.

To see attachments, please log in.

Like (0)

Posted: 8 years ago

Posted: 3 Jul 2017 8:10 EDT

surajs6226

hcl

replied to DPSSingh

Report

Hi DP Singh,

Can you elaborate on which table the indexes were missing. Did you infer that from thread dumps?

Thanks

Suraj

To see attachments, please log in.

Like (0)

Question

Server startup taking 1 hour with 30 minutes spent during "Starts Initializing Search Infrastructure" step

Need help or want to help others?

Experience the benefits of Support Center when you log in.

Question

Server startup taking 1 hour with 30 minutes spent during "Starts Initializing Search Infrastructure" step

Related content:

Need help or want to help others?

Experience the benefits of Support Center when you log in.

We'd prefer it if you saw us at our best.