|
5 | 5 | experimental::[] |
6 | 6 |
|
7 | 7 | {feat-imp-cap} values indicate which fields had the biggest impact on each |
8 | | -prediction that is generated by <<dfa-classification,{classification}>> or |
9 | | -<<dfa-regression,{regression}>> analysis. The features of the data points are |
10 | | -responsible for a particular prediction to varying degrees. {feat-imp-cap} shows |
11 | | -to what degree a given feature of a data point contributes to the prediction. |
12 | | -The {feat-imp} value can be either positive or negative depending on its effect |
13 | | -on the prediction. If the feature reduces the prediction value, the {feat-imp} |
14 | | -is negative, if it increases the prediction, then the {feat-imp} is positive. |
15 | | -The magnitude of {feat-imp} shows how significantly the feature affects the |
16 | | -prediction for a given data point. |
| 8 | +prediction that is generated by {classification} or {regression} analysis. Each |
| 9 | +{feat-imp} value has both a magnitude and a direction (positive or negative), |
| 10 | +which indicate how each field (or _feature_ of a data point) affects a |
| 11 | +particular prediction. |
17 | 12 |
|
18 | | -{feat-imp-cap} in the {stack} is calculated using the SHAP (SHapley Additive |
19 | | -exPlanations) method as described in |
20 | | -https://papers.nips.cc/paper/7062-a-unified-approach-to-interpreting-model-predictions.pdf[Lundberg, S. M., & Lee, S.-I. A Unified Approach to Interpreting Model Predictions. In NeurIPS 2017]. |
| 13 | +The purpose of {feat-imp} is to help you determine whether the predictions are |
| 14 | +sensible. Is the relationship between the dependent variable and the important |
| 15 | +features supported by your domain knowledge? The lessons you learn about the |
| 16 | +importance of specific features might also affect your decision to include them |
| 17 | +in future iterations of your trained model. |
| 18 | + |
| 19 | +You can see the average magnitude of the {feat-imp} values for each field across |
| 20 | +all the training data in {kib} or by using the |
| 21 | +{ref}/get-inference.html[get trained model API]. For example: |
| 22 | + |
| 23 | +[role="screenshot"] |
| 24 | +image::images/flights-regression-total-importance.png["Total {feat-imp} values for a {regression} {dfanalytics-job} in {kib}"] |
| 25 | + |
| 26 | +You can also examine the feature importance values for each individual |
| 27 | +prediction. In {kib}, you can see these values in JSON objects or decision plots: |
| 28 | + |
| 29 | +[role="screenshot"] |
| 30 | +image::images/flights-regression-decision-plot.png["Feature importance values for a {regression} {dfanalytics-job} in {kib}"] |
21 | 31 |
|
22 | | -By default, {feat-imp} values are not calculated when you configure the job via |
23 | | -the API. To generate this information, when you create a {dfanalytics-job} you |
24 | | -must specify the `num_top_feature_importance_values` property. When you |
25 | | -configure the job in {kib}, {feat-imp} values are calculated automatically. The |
26 | | -{feat-imp} values are stored in the {ml} results field for each document in the |
27 | | -destination index. |
| 32 | +For {reganalysis}, each decision plot starts at a shared baseline, which is |
| 33 | +the average of the prediction values for all the data points in the training |
| 34 | +data set. When you add all of the feature importance values for a particular |
| 35 | +data point to that baseline, you arrive at the numeric prediction value. If a |
| 36 | +{feat-imp} value is negative, it reduces the prediction value. If a {feat-imp} |
| 37 | +value is positive, it increases the prediction value. |
28 | 38 |
|
29 | | -NOTE: The number of {feat-imp} values for each document might be less than the |
30 | | -`num_top_feature_importance_values` property value. For example, it returns only |
31 | | -features that had a positive or negative effect on the prediction. |
| 39 | +//TBD: Add section about classification analysis. |
| 40 | + |
| 41 | +By default, {feat-imp} values are not calculated. To generate this information, |
| 42 | +when you create a {dfanalytics-job} you must specify the |
| 43 | +`num_top_feature_importance_values` property. For example, see |
| 44 | +<<flightdata-regression>>. |
| 45 | +//and <<flightdata-classification>>. |
| 46 | + |
| 47 | +The {feat-imp} values are stored in the {ml} results field for each document in |
| 48 | +the destination index. The number of {feat-imp} values for each document might |
| 49 | +be less than the `num_top_feature_importance_values` property value. For example, |
| 50 | +it returns only features that had a positive or negative effect on the |
| 51 | +prediction. |
32 | 52 |
|
33 | 53 | [[ml-feature-importance-readings]] |
34 | 54 | == Further reading |
35 | 55 |
|
36 | | -https://www.elastic.co/blog/feature-importance-for-data-frame-analytics-with-elastic-machine-learning[{feat-imp-cap} for {dfanalytics} with Elastic {ml}] |
| 56 | +{feat-imp-cap} in the {stack} is calculated using the SHAP (SHapley Additive |
| 57 | +exPlanations) method as described in |
| 58 | +https://papers.nips.cc/paper/7062-a-unified-approach-to-interpreting-model-predictions.pdf[Lundberg, S. M., & Lee, S.-I. A Unified Approach to Interpreting Model Predictions. In NeurIPS 2017]. |
| 59 | + |
| 60 | +See also |
| 61 | +https://www.elastic.co/blog/feature-importance-for-data-frame-analytics-with-elastic-machine-learning[{feat-imp-cap} for {dfanalytics} with Elastic {ml}]. |
0 commit comments