Dataflow: ratio requestor/partitions

Question

Phil5873

Member since 2019

9 posts

Realvalue Consulting

Posted: Jan 8, 2025

Last activity: Jan 16, 2025

Posted: 8 Jan 2025 5:36 EST
Last activity: 16 Jan 2025 4:03 EST

Solved

Dataflow: ratio requestor/partitions

Report

Hi everyone,

I am running a dataflow sourced by a report definition, with a defined partition key.

Let's assume the partition key can have x possible distinct values, and the dataflow will run on y nodes.

My questions are the following:

1) in the options configuration panel, before launching the dataflow execution, which is the most proper number of requestor I should set to maximise the throughput?

2) is it correct to assume that the number of requestor * number of nodes must be equal or slightly grater than the possible distinct values of the partition key?

Thanks to anyone who can help me with this doubt.

To see attachments, please log in.

Pega Infinity

Data Management

Performance

Customization

Decision Management

Installation and Deployment

Government

Lead System Architect

Reply
Like (0)
Share this page Facebook Twitter LinkedIn Email Copying... Copied!

Accepted Solution

Posted: 6 months ago

Updated: 6 months ago

Posted: 14 Jan 2025 9:58 EST
Updated: 16 Jan 2025 4:03 EST

Sairohith

HCA Healthcare

replied to Phil5873

Report

@Phil5873 To maximize throughput when running a dataflow sourced by a report definition with a partition key, set the number of requestors based on the number of nodes and the capacity of each node. The goal is to ensure all partitions are processed simultaneously without overloading the system. Ideally, the total requestors across all nodes should be equal to or slightly greater than the number of distinct values in the partition key. For example, if there are 20 distinct values and 3 nodes with a capacity of 4 requestors each, setting 12 requestors (3 nodes * 4 requestors) would be efficient. If the number of distinct values exceeds the total requestors, aim for a higher requestor count to avoid bottlenecks. On the other hand, if the partition values are fewer than the requestors, adjust the count downward to prevent unnecessary resource usage. Balancing the requestors with the available nodes and partition keys ensures optimal performance by leveraging parallel processing without overwhelming system resources.

To see attachments, please log in.

Reply
Likes (1)

Filippo Beltramini

Posted: 6 months ago

Posted: 14 Jan 2025 11:26 EST

Phil5873

Realvalue Consulting

replied to Sairohith

Report

@Sairohith Thank you for your clear response.

So it's correct assuming that the point to try to balance as much as possible the total number of available requestor and the distinct values identified by the partition key.

I will configure the Dataflow in this way.

To see attachments, please log in.

Reply
Likes (1)

Sairohith Thummarakoti

Posted: 6 months ago

Posted: 15 Jan 2025 21:32 EST

Question

Dataflow: ratio requestor/partitions

Need help or want to help others?

Experience the benefits of Support Center when you log in.

Question

Dataflow: ratio requestor/partitions

Related content:

Need help or want to help others?

Experience the benefits of Support Center when you log in.

We'd prefer it if you saw us at our best.