Skip to content

Commit 33e47ce

Browse files
authored
[DOCS] Add total feature importance (#1378)
1 parent 6075920 commit 33e47ce

File tree

2 files changed

+47
-22
lines changed

2 files changed

+47
-22
lines changed
66.2 KB
Loading

docs/en/stack/ml/df-analytics/ml-feature-importance.asciidoc

Lines changed: 47 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -5,32 +5,57 @@
55
experimental::[]
66

77
{feat-imp-cap} values indicate which fields had the biggest impact on each
8-
prediction that is generated by <<dfa-classification,{classification}>> or
9-
<<dfa-regression,{regression}>> analysis. The features of the data points are
10-
responsible for a particular prediction to varying degrees. {feat-imp-cap} shows
11-
to what degree a given feature of a data point contributes to the prediction.
12-
The {feat-imp} value can be either positive or negative depending on its effect
13-
on the prediction. If the feature reduces the prediction value, the {feat-imp}
14-
is negative, if it increases the prediction, then the {feat-imp} is positive.
15-
The magnitude of {feat-imp} shows how significantly the feature affects the
16-
prediction for a given data point.
8+
prediction that is generated by {classification} or {regression} analysis. Each
9+
{feat-imp} value has both a magnitude and a direction (positive or negative),
10+
which indicate how each field (or _feature_ of a data point) affects a
11+
particular prediction.
1712

18-
{feat-imp-cap} in the {stack} is calculated using the SHAP (SHapley Additive
19-
exPlanations) method as described in
20-
https://papers.nips.cc/paper/7062-a-unified-approach-to-interpreting-model-predictions.pdf[Lundberg, S. M., & Lee, S.-I. A Unified Approach to Interpreting Model Predictions. In NeurIPS 2017].
13+
The purpose of {feat-imp} is to help you determine whether the predictions are
14+
sensible. Is the relationship between the dependent variable and the important
15+
features supported by your domain knowledge? The lessons you learn about the
16+
importance of specific features might also affect your decision to include them
17+
in future iterations of your trained model.
18+
19+
You can see the average magnitude of the {feat-imp} values for each field across
20+
all the training data in {kib} or by using the
21+
{ref}/get-inference.html[get trained model API]. For example:
22+
23+
[role="screenshot"]
24+
image::images/flights-regression-total-importance.png["Total {feat-imp} values for a {regression} {dfanalytics-job} in {kib}"]
25+
26+
You can also examine the feature importance values for each individual
27+
prediction. In {kib}, you can see these values in JSON objects or decision plots:
28+
29+
[role="screenshot"]
30+
image::images/flights-regression-decision-plot.png["Feature importance values for a {regression} {dfanalytics-job} in {kib}"]
2131

22-
By default, {feat-imp} values are not calculated when you configure the job via
23-
the API. To generate this information, when you create a {dfanalytics-job} you
24-
must specify the `num_top_feature_importance_values` property. When you
25-
configure the job in {kib}, {feat-imp} values are calculated automatically. The
26-
{feat-imp} values are stored in the {ml} results field for each document in the
27-
destination index.
32+
For {reganalysis}, each decision plot starts at a shared baseline, which is
33+
the average of the prediction values for all the data points in the training
34+
data set. When you add all of the feature importance values for a particular
35+
data point to that baseline, you arrive at the numeric prediction value. If a
36+
{feat-imp} value is negative, it reduces the prediction value. If a {feat-imp}
37+
value is positive, it increases the prediction value.
2838

29-
NOTE: The number of {feat-imp} values for each document might be less than the
30-
`num_top_feature_importance_values` property value. For example, it returns only
31-
features that had a positive or negative effect on the prediction.
39+
//TBD: Add section about classification analysis.
40+
41+
By default, {feat-imp} values are not calculated. To generate this information,
42+
when you create a {dfanalytics-job} you must specify the
43+
`num_top_feature_importance_values` property. For example, see
44+
<<flightdata-regression>>.
45+
//and <<flightdata-classification>>.
46+
47+
The {feat-imp} values are stored in the {ml} results field for each document in
48+
the destination index. The number of {feat-imp} values for each document might
49+
be less than the `num_top_feature_importance_values` property value. For example,
50+
it returns only features that had a positive or negative effect on the
51+
prediction.
3252

3353
[[ml-feature-importance-readings]]
3454
== Further reading
3555

36-
https://www.elastic.co/blog/feature-importance-for-data-frame-analytics-with-elastic-machine-learning[{feat-imp-cap} for {dfanalytics} with Elastic {ml}]
56+
{feat-imp-cap} in the {stack} is calculated using the SHAP (SHapley Additive
57+
exPlanations) method as described in
58+
https://papers.nips.cc/paper/7062-a-unified-approach-to-interpreting-model-predictions.pdf[Lundberg, S. M., & Lee, S.-I. A Unified Approach to Interpreting Model Predictions. In NeurIPS 2017].
59+
60+
See also
61+
https://www.elastic.co/blog/feature-importance-for-data-frame-analytics-with-elastic-machine-learning[{feat-imp-cap} for {dfanalytics} with Elastic {ml}].

0 commit comments

Comments
 (0)