Dataflow: ratio requestor/partitions
Hi everyone,
I am running a dataflow sourced by a report definition, with a defined partition key.
Let's assume the partition key can have x possible distinct values, and the dataflow will run on y nodes.
My questions are the following:
1) in the options configuration panel, before launching the dataflow execution, which is the most proper number of requestor I should set to maximise the throughput?
2) is it correct to assume that the number of requestor * number of nodes must be equal or slightly grater than the possible distinct values of the partition key?
Thanks to anyone who can help me with this doubt.
@Phil5873 To maximize throughput when running a dataflow sourced by a report definition with a partition key, set the number of requestors based on the number of nodes and the capacity of each node. The goal is to ensure all partitions are processed simultaneously without overloading the system. Ideally, the total requestors across all nodes should be equal to or slightly greater than the number of distinct values in the partition key. For example, if there are 20 distinct values and 3 nodes with a capacity of 4 requestors each, setting 12 requestors (3 nodes * 4 requestors) would be efficient. If the number of distinct values exceeds the total requestors, aim for a higher requestor count to avoid bottlenecks. On the other hand, if the partition values are fewer than the requestors, adjust the count downward to prevent unnecessary resource usage. Balancing the requestors with the available nodes and partition keys ensures optimal performance by leveraging parallel processing without overwhelming system resources.