Support Center

Question

Ratan

Member since 2015

65 posts

Areteans Technology Solutions

Posted: Apr 27, 2020

Last activity: May 10, 2020

Posted: 27 Apr 2020 5:38 EDT
Last activity: 10 May 2020 11:12 EDT

Closed

Data flow Partitions

Report

How does the system derive the number of partitions to execute a data flow? Say I have a node (10 cores) configured as a dataflow node. Say the thread count is set up to 1 when we create a Batch processing data flow work object. How many partitions would be the system create? Does that depend on the data? If so, how would the data affect the partitions created by the data flow?

Assume that we are processing 100 records with partition keys distributed across 0 to 9.

***Edited by Moderator: Pallavi to change content type from Discussion to Question***

To see attachments, please log in.

Pega Marketing 8.3

Decision Management

Communications and Media

Lead System Architect

Next-Best-Action Analyst

Share this page Facebook Twitter LinkedIn Email Copying... Copied!

Posted: 5 years ago

Updated: 5 years ago

Posted: 27 Apr 2020 15:08 EDT
Updated: 4 May 2020 10:06 EDT

RakeshM16827894

Barclays

replied to Ratan

The partition depends upon the source data set you are using in DF

eg - if the source is database table then based on the partition key it will distribute the data across each partition

I can take age as the partition key, so in this case, for each age group, it will create one partition.

--- In the case of real-time data flow with source as stream data set to default 20 partitions can be created. however, that can be changed.

In the case of stream data set I can choose customer id as the partition key.

In this case (only for stream data set) , Kafka topic creates partition using a hash algorithm.

in the case of file data set as the source , it creates only one partition.

To see attachments, please log in.

Posted: 5 years ago

Posted: 3 May 2020 4:52 EDT

Ratan

Areteans Technology Solutions

replied to Ratan

Hey @mahar2, Thanks for the response. Could you explain the scenario that I mentioned in the actual post? The data set is associated with a database table. There are 100 records with equally distributed partition key ranging from 0 to 9. There is only one data flow node and the thread count on the data flow landing page is set to 1.

To see attachments, please log in.

Posted: 5 years ago

Posted: 10 May 2020 11:12 EDT

RakeshM16827894

Barclays

replied to Ratan

Hi Ratan,

In your case, you have one node, and the thread count is 1 and you have 10 partitions. Say P1 to p10

So Node1 - Thread 1 will process records of P1.

Once all the records of P1 are processed then it will pick another partition(Say P2)to process the records of P2 .

Thanks,

Rakesh

To see attachments, please log in.

Question

Data flow Partitions

Need help or want to help others?

Experience the benefits of Support Center when you log in.

Question

Data flow Partitions

Related content:

Need help or want to help others?

Experience the benefits of Support Center when you log in.

We'd prefer it if you saw us at our best.