Progressive detoriation of CPU usage across web nodes

Question

GwenM16680339

Member since 2022

1 post

XYZ

Posted: Nov 9, 2022

Last activity: Sep 15, 2023

Posted: 9 Nov 2022 6:58 EST
Last activity: 15 Sep 2023 6:55 EDT

Closed

Progressive detoriation of CPU usage across web nodes

Report

Hello all,

My team and I are experiencing problems with CPU and after months of analysis we still haven't figured out what is causing issues.

Let me explain a bit of our application architecture:

Currently we're having two main applications, lets call them X and Y.

Application X is used mostly by services rest, we have a front end which is invoking pega's API's. It is bult on Pega Platform.

Application Y is built on top of CS framework and application X.

Now let's go to the problem:

For the last couple of months we're having issues with CPU spikes across web nodes.

After restart everything work fine, but couple of days after we start to notice first CPU spikes on just one or two nodes. As days are passing, nodes which were the first ones to show spikes are showing much bigger spikes and other nodes start to show degradation.

Latest evidence shows that application Y, the one built on customer service, is causing spikes.

Wht we did so far is following:

Hello all,

My team and I are experiencing problems with CPU and after months of analysis we still haven't figured out what is causing issues.

Let me explain a bit of our application architecture:

Currently we're having two main applications, lets call them X and Y.

Application X is used mostly by services rest, we have a front end which is invoking pega's API's. It is bult on Pega Platform.

Application Y is built on top of CS framework and application X.

Now let's go to the problem:

For the last couple of months we're having issues with CPU spikes across web nodes.

Latest evidence shows that application Y, the one built on customer service, is causing spikes.

Wht we did so far is following:

Check events in the PDC one minute before and one minute after spikes -> no luck so far, I even had a help from an LSA who confirmed that events in PDC are not related to spikes
Thread dump: there was some evidence about slow JDBC messagges, but it doesn't seem to be strictly related to cpu spikes
Check heap - heap values were within normal values; no indicators of memory leak
check load balancer is traffic is unevenly distributed

I am running out of the ideas what to do next, does anyone have an idea what else could cause this issues?

Maybe some loops inside code? But it doesn't have sense since degradation of cpu usage happendìs gradually

I've attached screenshot of CPU usage by JVM from ocotber 26th until November 7h and in you can notice progressive detoriation of web nodes.

Show Less

***Edited by Moderator Marije to add Capability tags***

To see attachments, please log in.

Pega Platform 8.5.4