Question
GM
US
Last activity: 26 Apr 2018 12:54 EDT
What is the purpose of 'Thread Count' & 'Batch scalability factor' in the Edit Settings of Data Flow configuration screen
We are trying to optimize our data flow to handle around 14 million records. The Designer Studio --> Infrastructure --> Services --> Data Flow configuration screen when you click on the 'Edit settings' we see 2 configuration items - 'Thread Count' and 'Batch Scalability Factor'. What is the purpose of the 'Batch Scalability factor' setting? If we set the Thread Count to 3 and Batch Scalability factor to 2, what exactly does it imply?
We are on Pega 7.2.1 and PegaMarketing 7.21
Thanks
***Moderator Edit: Vidyaranjan | Updated Categories***
Hi Srikanth,
Batch scalability factor is used to calculate the suggested number of partitions to be used in a data flow run, that number is calculated using this formula (numOfNodes * threadCount * scalabilityFactor). Keep in mind that this calculation will only suggest a number of partition, it's up to the dataset implementation to decide how many partitions will actually be used.
Thread count by default nodes are configured to run with 5 threads. Each node that will take part in the data flow execution needs to be included in the service cluster. Note that setting a large number for thread count won't necessarily improve data flow execution speed. It's important to take the number of available cores in consideration when deciding on this value.
Let me know if the above information helps.
Regards,
Basavaraj