When using a Data Join in DSM is it more efficient to filter the input stream and the join stream prior to the join, or is it just as efficient to include the filter conditions in the join conditions of the Data Join?
***Updated by moderator: Lochan to update Categories***
Please see the attachment. What I'm asking is whether it's better to include filter components prior to the Data Join, or whether the filter conditions can be included in the Join Conditions with no performance impact, regardless of whether "Exclude source components..." is checked.
Both methods are functionally equivalent, but the latter uses fewer components, however, if it has a negative impact on performance then we should discourage this practice.
Posted: 4 years ago
Posted: 21 Jul 2017 4:01 EDT
Iolanda Da Costa Martins (dacoi)
Senior Principal Technical Writer
In theory, doing everything in the data join is less efficient (in terms of strategy execution) than filtering before you join the data. Because filters reduce the amount of data passed to the next component, we generally apply the filters as early as possible so that we offload the strategy from unnecessary inputs. But there is a way for you to evaluate it yourself by using sample data to test the strategy with a significant amount of records (100+). Make sure you have a data set or data flow that constructs the required input object, and use the data awareness capabilities of strategy: in the batch test run settings, point to the data set or data flow, select the option to run the strategy to evaluate performance. There are different types of measurements you can select to see on the canvas to evaluate the performance difference between the two design patterns.