It will be beneficial if you look at the Data Flow run progress page -> component statistics
Sort the table by "% of total time" column and identify what types of components taking larger portion,
if strategy comes top, then the env is shortage on CPU power then improve:
1) Increase thread count (limit max to CPU/cores) say if you got a 6 core and 1 thread/core CPU. Then the max DF Thread count you could set would be 6x1 - 1 = 5
2) Number of DF nodes
if DDS dataset comes up top then investigate "Cassandra", improve cpu, disk I/O, DF node- lower thread count, batch size, number of DF nodes etc.
Suggest testing these in load test environment to come up with a optimal configuration settings to suit your requirement and achieve the desired benchmarks.