Question
Best Buy Co Inc.
US
Last activity: 19 Nov 2015 6:03 EST
How does Grouping for ADM really work?
Not sure how selecting a value from 0-1 will tell the model to group the predictors. What would be the criteria for this grouping?
Best,
Jay Hysenbegasi
-
Like (0)
-
Share this page Facebook Twitter LinkedIn Email Copying... Copied!
Accepted Solution
Pegasystems Inc.
NL
Hi Jay,
Those settings are usually best left alone - the defaults are fine for most (if not all) situations. These settings allow an analytics expert to tweak the models in case the models don't behave as expected, but this would be very rare in practice.
To give a bit of a background: every predictor is grouped in statistically significant groups. For example, assume a predictor 'age'. This predictor might (for example) be grouped in groups (intervals) 0-20 years, 20-45 years and 45-80 years. Each group will have a statistically significant behavior. In practice, that might mean that persons in the 0-20 years group might be very unlikely to accept a certain offer, the 20-45 group might be very likely, while the 45-80 group has a probability to accept somewhere in between.
The 'grouping granularity' controls the threshold for what to consider statistically significant. Using a more granular settings could result in more groups, e.g 0-12, 12-16, 16-24, 24-38, 38-65 and 65-80. I.e. 6 groups instead of 3. More groups might give better predictions, but it also increases the risk of 'overfitting' - if the groups become too granular, they are becoming less predictive.
Hi Jay,
Those settings are usually best left alone - the defaults are fine for most (if not all) situations. These settings allow an analytics expert to tweak the models in case the models don't behave as expected, but this would be very rare in practice.
To give a bit of a background: every predictor is grouped in statistically significant groups. For example, assume a predictor 'age'. This predictor might (for example) be grouped in groups (intervals) 0-20 years, 20-45 years and 45-80 years. Each group will have a statistically significant behavior. In practice, that might mean that persons in the 0-20 years group might be very unlikely to accept a certain offer, the 20-45 group might be very likely, while the 45-80 group has a probability to accept somewhere in between.
The 'grouping granularity' controls the threshold for what to consider statistically significant. Using a more granular settings could result in more groups, e.g 0-12, 12-16, 16-24, 24-38, 38-65 and 65-80. I.e. 6 groups instead of 3. More groups might give better predictions, but it also increases the risk of 'overfitting' - if the groups become too granular, they are becoming less predictive.
For example, assume there is a group for only 23-year olds, and only a single 23 year old customer was observed who accepted the offer. The model would then assume that *all* 23 year olds would accept the offer. I.e. the model no longer generalizes. The default settings usually give the best balance between predictive power and 'robustness' (resilience against over-fitting).
The 'minimum number of cases' control the minimum number of cases (fraction) that should end up in each group. I.e. a value of 0.05 indicates that each group should contain at least 5% of all cases - this implies that there will be at most 20 groups.
Hope this helps,
-Danny
Updated: 17 Nov 2015 10:07 EST
Pegasystems Inc.
NL
Hi Jay,
Not sure if I understand the question. The Settings tab of an Adaptive Model rule defines Data Analysis settings like 'Grouping granularity' and 'Grouping minimum cases'.
Are you asking what the purpose is of these settings?
Kind regards,
-Danny
PS/FYI - here's a link to the related help page on PDN: https://community.pega.com/sites/default/files/help_v719/procomhelpmain.htm
Best Buy Co Inc.
US
Hi Danny,
Thanks for taking a look at my questions.
Yes so im a bit confused on what should guide our decision to change this setting. Meaning what is the expected grouping structure going to be. Its not clear to me how setting a value lets say .5 what should i base this decision on and what will PRPC do in the background to group the predictors that i have listed based on this setting.
Best,
Jay Hysenbegasi
Accepted Solution
Pegasystems Inc.
NL
Hi Jay,
Those settings are usually best left alone - the defaults are fine for most (if not all) situations. These settings allow an analytics expert to tweak the models in case the models don't behave as expected, but this would be very rare in practice.
To give a bit of a background: every predictor is grouped in statistically significant groups. For example, assume a predictor 'age'. This predictor might (for example) be grouped in groups (intervals) 0-20 years, 20-45 years and 45-80 years. Each group will have a statistically significant behavior. In practice, that might mean that persons in the 0-20 years group might be very unlikely to accept a certain offer, the 20-45 group might be very likely, while the 45-80 group has a probability to accept somewhere in between.
The 'grouping granularity' controls the threshold for what to consider statistically significant. Using a more granular settings could result in more groups, e.g 0-12, 12-16, 16-24, 24-38, 38-65 and 65-80. I.e. 6 groups instead of 3. More groups might give better predictions, but it also increases the risk of 'overfitting' - if the groups become too granular, they are becoming less predictive.
Hi Jay,
Those settings are usually best left alone - the defaults are fine for most (if not all) situations. These settings allow an analytics expert to tweak the models in case the models don't behave as expected, but this would be very rare in practice.
To give a bit of a background: every predictor is grouped in statistically significant groups. For example, assume a predictor 'age'. This predictor might (for example) be grouped in groups (intervals) 0-20 years, 20-45 years and 45-80 years. Each group will have a statistically significant behavior. In practice, that might mean that persons in the 0-20 years group might be very unlikely to accept a certain offer, the 20-45 group might be very likely, while the 45-80 group has a probability to accept somewhere in between.
The 'grouping granularity' controls the threshold for what to consider statistically significant. Using a more granular settings could result in more groups, e.g 0-12, 12-16, 16-24, 24-38, 38-65 and 65-80. I.e. 6 groups instead of 3. More groups might give better predictions, but it also increases the risk of 'overfitting' - if the groups become too granular, they are becoming less predictive.
For example, assume there is a group for only 23-year olds, and only a single 23 year old customer was observed who accepted the offer. The model would then assume that *all* 23 year olds would accept the offer. I.e. the model no longer generalizes. The default settings usually give the best balance between predictive power and 'robustness' (resilience against over-fitting).
The 'minimum number of cases' control the minimum number of cases (fraction) that should end up in each group. I.e. a value of 0.05 indicates that each group should contain at least 5% of all cases - this implies that there will be at most 20 groups.
Hope this helps,
-Danny