What is the purpose of 'Thread Count' & 'Batch scalability factor' in the Edit Settings of Data Flow configuration screen

Question

SrikanthK2988

Member since 2014

1 post

Posted: Apr 24, 2018

Last activity: Apr 26, 2018

Posted: 24 Apr 2018 15:59 EDT
Last activity: 26 Apr 2018 12:54 EDT

Closed

What is the purpose of 'Thread Count' & 'Batch scalability factor' in the Edit Settings of Data Flow configuration screen

Report

We are trying to optimize our data flow to handle around 14 million records. The Designer Studio --> Infrastructure --> Services --> Data Flow configuration screen when you click on the 'Edit settings' we see 2 configuration items - 'Thread Count' and 'Batch Scalability Factor'. What is the purpose of the 'Batch Scalability factor' setting? If we set the Thread Count to 3 and Batch Scalability factor to 2, what exactly does it imply?

We are on Pega 7.2.1 and PegaMarketing 7.21

Thanks

***Moderator Edit: Vidyaranjan | Updated Categories***

To see attachments, please log in.

Decision Management

Share this page Facebook Twitter LinkedIn Email Copying... Copied!

Posted: 7 years ago

Posted: 25 Apr 2018 7:59 EDT

BASAVARAJ

PEGA

replied to SrikanthK2988

Hi Srikanth,

Batch scalability factor is used to calculate the suggested number of partitions to be used in a data flow run, that number is calculated using this formula (numOfNodes * threadCount * scalabilityFactor). Keep in mind that this calculation will only suggest a number of partition, it's up to the dataset implementation to decide how many partitions will actually be used.

Thread count by default nodes are configured to run with 5 threads. Each node that will take part in the data flow execution needs to be included in the service cluster. Note that setting a large number for thread count won't necessarily improve data flow execution speed. It's important to take the number of available cores in consideration when deciding on this value.

Let me know if the above information helps.

Regards,

Basavaraj

To see attachments, please log in.

Posted: 7 years ago

Posted: 25 Apr 2018 9:19 EDT

SrikanthK2988

replied to BASAVARAJ

Thanks Basavraj.

Does this mean that the scalabilityFactor does not have effect on the number of partitions that will be processed in parallel? If we have 5 nodes and set threadCount to 2 and batchScalabilityFactor to 2 and have number of partitions to 20, then the number of partitions that will be processed in parallel is 10 (5 nodes * 2 threads = 10) and not 20 (5 nodes * 2 threads * 2 scalabilityFactor).

Regards,

Srikanth

To see attachments, please log in.

Posted: 7 years ago

Posted: 26 Apr 2018 4:58 EDT

Raju Botu

Coforge

replied to SrikanthK2988

Hello Srikanth

Is there any latest finding on this ?

I am trying to figure this out too.

Regards

Raju Botu

To see attachments, please log in.

Posted: 7 years ago

Posted: 26 Apr 2018 12:54 EDT

SrikanthK2988

replied to Raju Botu

From what we have seen in out testing, the batchScalabilityFactor does not have any effect on the number of partitions that are processed in parallel. Only the 'Thread Count' determines how many partitions are processed in parallel. This is the case for RDMBS database (Oracle). We heard that the batchScalabilityFactor will come into the picture when dealing with a Cassandra database - but I am not sure how that works.

Regards,

Srikanth

To see attachments, please log in.

Question

What is the purpose of 'Thread Count' & 'Batch scalability factor' in the Edit Settings of Data Flow configuration screen

Need help or want to help others?

Experience the benefits of Support Center when you log in.

Question

What is the purpose of 'Thread Count' & 'Batch scalability factor' in the Edit Settings of Data Flow configuration screen

Related content:

Need help or want to help others?

Experience the benefits of Support Center when you log in.

We'd prefer it if you saw us at our best.