PDC shows slow query on PEGA.PR_DATA_QP_RUN_PARTITION table.

Question

RajeshC

Member since 2009

12 posts

CMC Markets

Posted: Dec 8, 2022

Last activity: Dec 21, 2022

Posted: 8 Dec 2022 5:47 EST
Last activity: 21 Dec 2022 7:47 EST

Solved

PDC shows slow query on PEGA.PR_DATA_QP_RUN_PARTITION table.

Report

Hello,

We have recently upgraded to 8.7.3 and can see that PDC prod shows slow Slow SQL on table PEGA.PR_DATA_QP_RUN_PARTITION.

Logging at least 10 events every day with a couple of them exceeding 30 seconds at times. Other times it seems to work fine.

SQL - UPDATE PEGA.pr_data_qp_run_partition "PC0" SET "PXPRODUCEDATETIME" = ? WHERE ("PC0"."PXQUEUEPROCESSORNAME" = ? AND "PC0"."PXSTREAMPARTITION" = ? AND ("PC0"."PXPRODUCEDATETIME" IS NULL OR "PC0"."PXPRODUCEDATETIME" < ?))

There is already an index on this table on columns PXQUEUEPROCESSORNAME and PXSTREAMPARTITION which is a OOTB Pega table.

Has any one else faced this issue and how it got resolved.

Thanks,

Rajesh.

***Edited by Moderator Marije to add Capability tags***

***Edited by Moderator Marije to add Support Case Details; ***

To see attachments, please log in.

Pega Platform 8.7.3

Pega Diagnostic Center

Performance

Data Management

Updates

Installation and Deployment

System Administration

Reply
Like (0)
Share this page Facebook Twitter LinkedIn Email Copying... Copied!

Accepted Solution

Posted: 2 years ago

Updated: 2 years ago

Posted: 21 Dec 2022 4:54 EST
Updated: 21 Dec 2022 7:47 EST

MarijeSchillern

MOD

replied to RajeshC

Report

I can see the following recommendations were made in INC-252581 :

"Based on the information provided, it appears that your system suffered from a temporary database connection glitch that cause the sudden PEGA0005 alert spike.

See below recommendations:

1) please consider to allocate dedicated JVM for Stream Nodes.

2) In your prconfig.xml, we found the below connection pool entries:

When specifying the connection pool setting in prconfig.xml, the system will uses Pega's Internal connectionpool. This is a configuration used in very old version of Pega. You should not need these setting in prconfig.xml as you have defined the connection pool in Tomcat's context.xml

3) In Tomcat, in the context of JDBC connection pool, you can either use Tomcat JDBC Connection Pool org.ap.ache.tomcat.jdbc.pool or Apache Commons DBCP connection pool. When you define factory="org.apache.tomcat.jdbc.pool.DataSourceFactory", Tomcat will use Tomcat JDBC Connection Pool implementation, otherwise Tomcat will default to the Apache DBCP2 implementation.

I can see the following recommendations were made in INC-252581 :

"Based on the information provided, it appears that your system suffered from a temporary database connection glitch that cause the sudden PEGA0005 alert spike.

See below recommendations:

1) please consider to allocate dedicated JVM for Stream Nodes.

2) In your prconfig.xml, we found the below connection pool entries:

Note that, you are using different connection pool implementation for jdbc/Sample comparing to jdbc/PegaRULES and jdbc/PegaDATA. Unless you have a valid business reason, we recommend to use the same implementation for all connection pools.

4) For your jdbc/PegaRULES and jdbc/PegaDATA connection pools where factory="org.apache.tomcat.jdbc.pool.DataSourceFactory" has been defined, Tomcat will use Tomcat JDBC Connection Pool implementation. With this implementation in place, the connection parameters of maxActive, maxIdle and maxWait should be used. In your current configuration, you are using maxTotal instead of maxActive. "

Reference: https://tomcat.apache.org/migration-8.html#Database_Connection_Pooling https://tomcat.apache.org/tomcat-8.0-doc/jdbc-pool.html

Issue identified as a socket issue to be investigated with the DBA.

Show Less

View reply inline

To see attachments, please log in.

Posted: 2 years ago

Posted: 8 Dec 2022 6:42 EST

MarijeSchillern

MOD

replied to RajeshC

Report

@FromRajesh below some suggestions . I am not an SME in stream / kafka issues but this type of issue is likely because queries hitting the PR_DATA_QP_RUN_PARTITION table were caused by the increased processing from the pyFTSIncrementalIndexer.

When it comes to the performance of Queue processor( a background processing in Pega), there are two aspects:-

Message processing time:-

The time to process a message depends on the amount of work done by the processing activity. It will be helpful to check the performance of the QP activity

and identify any bottleneck under load testing.

Also please refer to the Best practices for writing activities for background jobs

2. Total message throughput :-

Scale out by increasing the number of processing nodes. If the CPU usage is well below maximum, you do not need to scale nodes right away.

Check threads per node and total number of threads is matching with the Kafka partition count.

Increasing the Kafka partition is not a very straight forward recommendation as it can have other implications.

When it comes to the performance of Queue processor( a background processing in Pega), there are two aspects:-

Message processing time:-

The time to process a message depends on the amount of work done by the processing activity. It will be helpful to check the performance of the QP activity

and identify any bottleneck under load testing.

Also please refer to the Best practices for writing activities for background jobs

2. Total message throughput :-

Scale out by increasing the number of processing nodes. If the CPU usage is well below maximum, you do not need to scale nodes right away.

Check threads per node and total number of threads is matching with the Kafka partition count.

Increasing the Kafka partition is not a very straight forward recommendation as it can have other implications.

Increasing the partitions should also be accompanied by increasing the thread counts per background nodes. Higher number of threads can cause higher CPU usage and hence processing slowness.

Moreover, the DSS setting to increase partition affects any queue processor that you are going to create. Thus there can be impact on the Kafka side itself.

So we do not recommend increasing partition count increase.

Furthermore, increasing the partition count will not help in the situation were the activity itself faces bottleneck. Once the record is being from Kafka and pushed to processing to the activity

Next Step :- (****not for prod environments****)

If you have seen the processing time even goes up then it is the activity that took so long.

You should see PEGA0117 alerts in PDC.

Configure default partition count via Dynamic System Setting:

Owning Ruleset = Pega-Engine
Setting Purpose = prconfig/dsm/services/stream/pyTopicPartitionsCount/default
Value = N [ N = No. of partitions]
Write down the list of all queue processors
Stop all queue processors
Stop all pega nodes

Check ERROR logs

Check DSS timeout for the cluster node connection management in master election and admin operation at elasticsearch end.

Ruleset : Pega-SearchEngine Purpose : indexing/distributed/cluster_management_timeout_millis

Reset queue processor partitions by running the following statement

delete from pr_data_decision_df_part where pxWorkID in ('<qp-name-1>', '<qp-name-2>, .., '<qp-name-134>'')

Where <qp-name-x> corresponds to the list of processors.

Truncate the Stream service tables in the Pega Platform

truncate table pr_data_stream_nodes
truncate table pr_data_stream_sessions
truncate table pr_data_stream_node_updates

delete relevant records from pr_data_qp_run_partition.

delete from pr_data_qp_run_partition where pxqueueprocessorname in (‘<QP1>’….’<QPn>’)

Delete kafka directories from all stream nodes

Start stream nodes
Start the rest of the pega nodes
Restart queue processors if not started. Verify partition count on the stream landing page.
Pick one queue processors and verify that it uses the configured number of partitions:
Open associated data flow
On the distribution tab check number of partitions. It should match the configured number.

If you need further investigation I suggest that you log a support INC incident on the MSP. In the ticket please provide following details:

What is the system name that can be checked from PDC.
Share the name of the custom queue processor.
Share more metrics from the activity processing side, like if any step is taking longer, etc.
Let us know the timestamp of your next load test which we can verify from PDC

Show Less

To see attachments, please log in.

Reply
Likes (1)

Saikishore Sanagapalli

Posted: 2 years ago

Posted: 9 Dec 2022 11:36 EST

RajeshC

CMC Markets

replied to MarijeSchillern

Report

@MarijeSchillern , thanks for your response.

1) The alerts point to rule PZFTSINCREMENTALINDEXERUSINGQP which is associated with QP - pyFTSIncrementalIndexer. Number of threads is set as 2, same as ootb and hasnt changed.

2) Dont see an PEGA0117 alert as such around that time.

I will raise an SR with the relevant details.

To see attachments, please log in.

Posted: 2 years ago

Posted: 9 Dec 2022 12:03 EST

MarijeSchillern

MOD

replied to RajeshC

Report

@RajeshC thanks for the update. Can you provide the INC id here so that we can help track it?

To see attachments, please log in.

Posted: 2 years ago

Posted: 12 Dec 2022 6:19 EST

RajeshC

CMC Markets

replied to MarijeSchillern

Report

@MarijeSchillern - INC-252581 for reference.

To see attachments, please log in.

Reply
Likes (2)

Marije Schillern John Paul Raja Christu Raja

Accepted Solution

Posted: 2 years ago

Updated: 2 years ago

Posted: 21 Dec 2022 4:54 EST
Updated: 21 Dec 2022 7:47 EST

MarijeSchillern

MOD

replied to RajeshC

Report

I can see the following recommendations were made in INC-252581 :

"Based on the information provided, it appears that your system suffered from a temporary database connection glitch that cause the sudden PEGA0005 alert spike.

See below recommendations:

1) please consider to allocate dedicated JVM for Stream Nodes.

2) In your prconfig.xml, we found the below connection pool entries:

I can see the following recommendations were made in INC-252581 :

"Based on the information provided, it appears that your system suffered from a temporary database connection glitch that cause the sudden PEGA0005 alert spike.

See below recommendations:

1) please consider to allocate dedicated JVM for Stream Nodes.

2) In your prconfig.xml, we found the below connection pool entries:

Reference: https://tomcat.apache.org/migration-8.html#Database_Connection_Pooling https://tomcat.apache.org/tomcat-8.0-doc/jdbc-pool.html

Issue identified as a socket issue to be investigated with the DBA.

Show Less

To see attachments, please log in.

Question

PDC shows slow query on PEGA.PR_DATA_QP_RUN_PARTITION table.

Need help or want to help others?

Experience the benefits of Support Center when you log in.

Question

PDC shows slow query on PEGA.PR_DATA_QP_RUN_PARTITION table.

Related content:

Need help or want to help others?

Experience the benefits of Support Center when you log in.

We'd prefer it if you saw us at our best.