Question
CMC Markets
GB
Last activity: 21 Dec 2022 7:47 EST
PDC shows slow query on PEGA.PR_DATA_QP_RUN_PARTITION table.
Hello,
We have recently upgraded to 8.7.3 and can see that PDC prod shows slow Slow SQL on table PEGA.PR_DATA_QP_RUN_PARTITION.
Logging at least 10 events every day with a couple of them exceeding 30 seconds at times. Other times it seems to work fine.
SQL - UPDATE PEGA.pr_data_qp_run_partition "PC0" SET "PXPRODUCEDATETIME" = ? WHERE ("PC0"."PXQUEUEPROCESSORNAME" = ? AND "PC0"."PXSTREAMPARTITION" = ? AND ("PC0"."PXPRODUCEDATETIME" IS NULL OR "PC0"."PXPRODUCEDATETIME" < ?))
There is already an index on this table on columns PXQUEUEPROCESSORNAME and PXSTREAMPARTITION which is a OOTB Pega table.
Has any one else faced this issue and how it got resolved.
Thanks,
Rajesh.
***Edited by Moderator Marije to add Support Case Details; ***
-
Reply
-
Share this page Facebook Twitter LinkedIn Email Copying... Copied!
Accepted Solution
Updated: 21 Dec 2022 7:47 EST
Pegasystems Inc.
GB
I can see the following recommendations were made in INC-252581 :
"Based on the information provided, it appears that your system suffered from a temporary database connection glitch that cause the sudden PEGA0005 alert spike.
See below recommendations:
1) please consider to allocate dedicated JVM for Stream Nodes.
2) In your prconfig.xml, we found the below connection pool entries:
<env name="database/databases/PegaRULES/dataSource" value="java:comp/env/jdbc/PegaRULES"/> <env name="database/databases/Sample/dataSource" value="java:comp/env/jdbc/Sample"/> <env name="database/databases/PegaDATA/dataSource" value="java:comp/env/jdbc/PegaDATA"/>
When specifying the connection pool setting in prconfig.xml, the system will uses Pega's Internal connectionpool. This is a configuration used in very old version of Pega. You should not need these setting in prconfig.xml as you have defined the connection pool in Tomcat's context.xml
3) In Tomcat, in the context of JDBC connection pool, you can either use Tomcat JDBC Connection Pool org.ap.ache.tomcat.jdbc.pool or Apache Commons DBCP connection pool. When you define factory="org.apache.tomcat.jdbc.pool.DataSourceFactory", Tomcat will use Tomcat JDBC Connection Pool implementation, otherwise Tomcat will default to the Apache DBCP2 implementation.
I can see the following recommendations were made in INC-252581 :
"Based on the information provided, it appears that your system suffered from a temporary database connection glitch that cause the sudden PEGA0005 alert spike.
See below recommendations:
1) please consider to allocate dedicated JVM for Stream Nodes.
2) In your prconfig.xml, we found the below connection pool entries:
<env name="database/databases/PegaRULES/dataSource" value="java:comp/env/jdbc/PegaRULES"/> <env name="database/databases/Sample/dataSource" value="java:comp/env/jdbc/Sample"/> <env name="database/databases/PegaDATA/dataSource" value="java:comp/env/jdbc/PegaDATA"/>
When specifying the connection pool setting in prconfig.xml, the system will uses Pega's Internal connectionpool. This is a configuration used in very old version of Pega. You should not need these setting in prconfig.xml as you have defined the connection pool in Tomcat's context.xml
3) In Tomcat, in the context of JDBC connection pool, you can either use Tomcat JDBC Connection Pool org.ap.ache.tomcat.jdbc.pool or Apache Commons DBCP connection pool. When you define factory="org.apache.tomcat.jdbc.pool.DataSourceFactory", Tomcat will use Tomcat JDBC Connection Pool implementation, otherwise Tomcat will default to the Apache DBCP2 implementation.
Note that, you are using different connection pool implementation for jdbc/Sample comparing to jdbc/PegaRULES and jdbc/PegaDATA. Unless you have a valid business reason, we recommend to use the same implementation for all connection pools.
4) For your jdbc/PegaRULES and jdbc/PegaDATA connection pools where factory="org.apache.tomcat.jdbc.pool.DataSourceFactory" has been defined, Tomcat will use Tomcat JDBC Connection Pool implementation. With this implementation in place, the connection parameters of maxActive, maxIdle and maxWait should be used. In your current configuration, you are using maxTotal instead of maxActive. "
Reference: https://tomcat.apache.org/migration-8.html#Database_Connection_Pooling https://tomcat.apache.org/tomcat-8.0-doc/jdbc-pool.html
Issue identified as a socket issue to be investigated with the DBA.
Pegasystems Inc.
GB
@FromRajesh below some suggestions . I am not an SME in stream / kafka issues but this type of issue is likely because queries hitting the PR_DATA_QP_RUN_PARTITION table were caused by the increased processing from the pyFTSIncrementalIndexer.
When it comes to the performance of Queue processor( a background processing in Pega), there are two aspects:-
- Message processing time:-
The time to process a message depends on the amount of work done by the processing activity. It will be helpful to check the performance of the QP activity
and identify any bottleneck under load testing.
Also please refer to the Best practices for writing activities for background jobs
2. Total message throughput :-
Scale out by increasing the number of processing nodes. If the CPU usage is well below maximum, you do not need to scale nodes right away.
Check threads per node and total number of threads is matching with the Kafka partition count.
Increasing the Kafka partition is not a very straight forward recommendation as it can have other implications.
@FromRajesh below some suggestions . I am not an SME in stream / kafka issues but this type of issue is likely because queries hitting the PR_DATA_QP_RUN_PARTITION table were caused by the increased processing from the pyFTSIncrementalIndexer.
When it comes to the performance of Queue processor( a background processing in Pega), there are two aspects:-
- Message processing time:-
The time to process a message depends on the amount of work done by the processing activity. It will be helpful to check the performance of the QP activity
and identify any bottleneck under load testing.
Also please refer to the Best practices for writing activities for background jobs
2. Total message throughput :-
Scale out by increasing the number of processing nodes. If the CPU usage is well below maximum, you do not need to scale nodes right away.
Check threads per node and total number of threads is matching with the Kafka partition count.
Increasing the Kafka partition is not a very straight forward recommendation as it can have other implications.
Increasing the partitions should also be accompanied by increasing the thread counts per background nodes. Higher number of threads can cause higher CPU usage and hence processing slowness.
Moreover, the DSS setting to increase partition affects any queue processor that you are going to create. Thus there can be impact on the Kafka side itself.
So we do not recommend increasing partition count increase.
Furthermore, increasing the partition count will not help in the situation were the activity itself faces bottleneck. Once the record is being from Kafka and pushed to processing to the activity
Next Step :- (****not for prod environments****)
If you have seen the processing time even goes up then it is the activity that took so long.
- You should see PEGA0117 alerts in PDC.
Configure default partition count via Dynamic System Setting:
- Owning Ruleset = Pega-Engine
- Setting Purpose = prconfig/dsm/services/stream/pyTopicPartitionsCount/default
- Value = N [ N = No. of partitions]
- Write down the list of all queue processors
- Stop all queue processors
- Stop all pega nodes
Check ERROR logs
Check DSS timeout for the cluster node connection management in master election and admin operation at elasticsearch end.
Ruleset : Pega-SearchEngine Purpose : indexing/distributed/cluster_management_timeout_millis
Reset queue processor partitions by running the following statement
- delete from pr_data_decision_df_part where pxWorkID in ('<qp-name-1>', '<qp-name-2>, .., '<qp-name-134>'')
Where <qp-name-x> corresponds to the list of processors.
Truncate the Stream service tables in the Pega Platform
- truncate table pr_data_stream_nodes
- truncate table pr_data_stream_sessions
- truncate table pr_data_stream_node_updates
delete relevant records from pr_data_qp_run_partition.
- delete from pr_data_qp_run_partition where pxqueueprocessorname in (‘<QP1>’….’<QPn>’)
Delete kafka directories from all stream nodes
- Start stream nodes
- Start the rest of the pega nodes
- Restart queue processors if not started. Verify partition count on the stream landing page.
- Pick one queue processors and verify that it uses the configured number of partitions:
- Open associated data flow
- On the distribution tab check number of partitions. It should match the configured number.
If you need further investigation I suggest that you log a support INC incident on the MSP. In the ticket please provide following details:
- What is the system name that can be checked from PDC.
- Share the name of the custom queue processor.
- Share more metrics from the activity processing side, like if any step is taking longer, etc.
- Let us know the timestamp of your next load test which we can verify from PDC
CMC Markets
GB
@MarijeSchillern , thanks for your response.
1) The alerts point to rule PZFTSINCREMENTALINDEXERUSINGQP which is associated with QP - pyFTSIncrementalIndexer. Number of threads is set as 2, same as ootb and hasnt changed.
2) Dont see an PEGA0117 alert as such around that time.
I will raise an SR with the relevant details.
CMC Markets
GB
@MarijeSchillern - INC-252581 for reference.
Accepted Solution
Updated: 21 Dec 2022 7:47 EST
Pegasystems Inc.
GB
I can see the following recommendations were made in INC-252581 :
"Based on the information provided, it appears that your system suffered from a temporary database connection glitch that cause the sudden PEGA0005 alert spike.
See below recommendations:
1) please consider to allocate dedicated JVM for Stream Nodes.
2) In your prconfig.xml, we found the below connection pool entries:
<env name="database/databases/PegaRULES/dataSource" value="java:comp/env/jdbc/PegaRULES"/> <env name="database/databases/Sample/dataSource" value="java:comp/env/jdbc/Sample"/> <env name="database/databases/PegaDATA/dataSource" value="java:comp/env/jdbc/PegaDATA"/>
When specifying the connection pool setting in prconfig.xml, the system will uses Pega's Internal connectionpool. This is a configuration used in very old version of Pega. You should not need these setting in prconfig.xml as you have defined the connection pool in Tomcat's context.xml
3) In Tomcat, in the context of JDBC connection pool, you can either use Tomcat JDBC Connection Pool org.ap.ache.tomcat.jdbc.pool or Apache Commons DBCP connection pool. When you define factory="org.apache.tomcat.jdbc.pool.DataSourceFactory", Tomcat will use Tomcat JDBC Connection Pool implementation, otherwise Tomcat will default to the Apache DBCP2 implementation.
I can see the following recommendations were made in INC-252581 :
"Based on the information provided, it appears that your system suffered from a temporary database connection glitch that cause the sudden PEGA0005 alert spike.
See below recommendations:
1) please consider to allocate dedicated JVM for Stream Nodes.
2) In your prconfig.xml, we found the below connection pool entries:
<env name="database/databases/PegaRULES/dataSource" value="java:comp/env/jdbc/PegaRULES"/> <env name="database/databases/Sample/dataSource" value="java:comp/env/jdbc/Sample"/> <env name="database/databases/PegaDATA/dataSource" value="java:comp/env/jdbc/PegaDATA"/>
When specifying the connection pool setting in prconfig.xml, the system will uses Pega's Internal connectionpool. This is a configuration used in very old version of Pega. You should not need these setting in prconfig.xml as you have defined the connection pool in Tomcat's context.xml
3) In Tomcat, in the context of JDBC connection pool, you can either use Tomcat JDBC Connection Pool org.ap.ache.tomcat.jdbc.pool or Apache Commons DBCP connection pool. When you define factory="org.apache.tomcat.jdbc.pool.DataSourceFactory", Tomcat will use Tomcat JDBC Connection Pool implementation, otherwise Tomcat will default to the Apache DBCP2 implementation.
Note that, you are using different connection pool implementation for jdbc/Sample comparing to jdbc/PegaRULES and jdbc/PegaDATA. Unless you have a valid business reason, we recommend to use the same implementation for all connection pools.
4) For your jdbc/PegaRULES and jdbc/PegaDATA connection pools where factory="org.apache.tomcat.jdbc.pool.DataSourceFactory" has been defined, Tomcat will use Tomcat JDBC Connection Pool implementation. With this implementation in place, the connection parameters of maxActive, maxIdle and maxWait should be used. In your current configuration, you are using maxTotal instead of maxActive. "
Reference: https://tomcat.apache.org/migration-8.html#Database_Connection_Pooling https://tomcat.apache.org/tomcat-8.0-doc/jdbc-pool.html
Issue identified as a socket issue to be investigated with the DBA.