-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Adding functional tests for explainability #2584
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding functional tests for explainability #2584
Conversation
| } | ||
|
|
||
| /// <summary> | ||
| /// LocalFeatureImportance: Per-row feature importance can be computed through FeatureContributionCalculator for a linear model. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GAM #Resolved
| } | ||
|
|
||
| /// <summary> | ||
| /// LocalFeatureImportance: Per-row feature importance can be computed through FeatureContributionCalculator for a linear model. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FastForest #Resolved
| } | ||
|
|
||
| /// <summary> | ||
| /// LocalFeatureImportance: Per-row feature importance can be computed through FeatureContributionCalculator for a linear model. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FastTree #Resolved
| var linearModel = model.LastTransformer.Model; | ||
|
|
||
| var weights = linearModel.Weights; | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: you commented the last step in all your examples, you could add a comment saying you are getting the weights and making sure there are the correct number of them. #Resolved
artidoro
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🕐
artidoro
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
![]()
| namespace Microsoft.ML.Functional.Tests.Datasets | ||
| { | ||
| /// <summary> | ||
| /// A schematized class for loading the HousingRegression dataset. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A schematized class for loading the HousingRegression dataset. [](start = 7, length = 63)
Either class name or the comment is incorrect,..
| var transformedData = model.Transform(data); | ||
|
|
||
| // Compute the permutation feature importance to look at global feature importance. | ||
| var permutationMetrics = mlContext.Regression.PermutationFeatureImportance(model.LastTransformer, transformedData); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
permutationMetrics [](start = 16, length = 18)
any way to know if the PFI are actually correct - meaining that important features were marked as such? Perhaps get a baseline or use generated dataset #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tests for PFI correctness are in Microsoft.ML.Tests. They validate that the importances produced by PFI are correct.
The purpose of Microsoft.ML.Functional.Tests is to guarantee that end-to-end scenarios work through public APIs, and that the results are returned in a way that make sense: That metrics objects are returned, that the individual metrics are in the allowable range for the metric, etc. In other words, these are not meant to be baseline or correctness tests.
That is to say, if you fix a numerical bug in ML.NET, these tests should not fail. But, if you change the output of an API, metrics start returning nonsensical values, or a scenario is no longer possible through public APIs, these tests should fail.
In reply to: 258589829 [](ancestors = 258589829)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test would fail to detect if the method will return a totally bogus positive value via an API bug.
In reply to: 259082766 [](ancestors = 259082766,258589829)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| { | ||
| var mlContext = new MLContext(seed: 1, conc: 1); | ||
|
|
||
| // Get the dataset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
. #Closed
| var linearModel = model.LastTransformer.Model; | ||
|
|
||
| // Make sure the number of model weights returned matches the length of the input feature vector. | ||
| var weights = linearModel.Weights; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Weights [](start = 38, length = 7)
validate that weights are reasonable with baseline or some other heuristic, not that they are just nonnegative #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above comment about FunctionalTests vs. Correctness tests and Baseline tests.
In reply to: 258590457 [](ancestors = 258590457)
| treeModel.GetFeatureWeights(ref weights); | ||
|
|
||
| // Make sure the number of feature gains returned matches the length of the input feature vector. | ||
| Assert.Equal(HousingRegression.Features.Length, weights.Length); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Equal [](start = 19, length = 5)
same issue - insuffecient validation #Closed
|
|
||
| // Fit the pipeline and transform the data. | ||
| var model = pipeline.Fit(data); | ||
| var scoredData = model.Transform(data); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems common to all tests. factor out. #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to do this, but unfortunately we no longer have a way to specify a generic model that can be used with the FeatureContributionCalculator, as we made all the interfaces internal.
In reply to: 258591290 [](ancestors = 258591290)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| var model = pipeline.Fit(data); | ||
| var scoredData = model.Transform(data); | ||
|
|
||
| // Create a Feature Contribution Calculator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calculator [](start = 45, length = 10)
more dots #Closed
| var scoringEnumerator = mlContext.CreateEnumerable<FeatureContributionOutput>(shuffledSubset, true); | ||
|
|
||
| // Make sure the number of feature contributions returned matches the length of the input feature vector. | ||
| foreach (var row in scoringEnumerator) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
must. do. moar. validation. #Closed
|
|
||
| // Validate that the contributions are there | ||
| var shuffledSubset = mlContext.Data.TakeRows(mlContext.Data.ShuffleRows(outputData), 10); | ||
| var scoringEnumerator = mlContext.CreateEnumerable<FeatureContributionOutput>(shuffledSubset, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
validate results #Closed
| // Compute the contributions | ||
| var outputData = featureContributions.Fit(scoredData).Transform(scoredData); | ||
|
|
||
| // Validate that the contributions are there |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Validate that the contributions are there [](start = 15, length = 41)
make sure the contributions are correct, not just right sign #Closed
glebuk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
![]()
Codecov Report
@@ Coverage Diff @@
## master #2584 +/- ##
==========================================
+ Coverage 71.5% 71.58% +0.07%
==========================================
Files 801 803 +2
Lines 142023 141968 -55
Branches 16147 16124 -23
==========================================
+ Hits 101557 101621 +64
+ Misses 35998 35907 -91
+ Partials 4468 4440 -28
|
1 similar comment
Codecov Report
@@ Coverage Diff @@
## master #2584 +/- ##
==========================================
+ Coverage 71.5% 71.58% +0.07%
==========================================
Files 801 803 +2
Lines 142023 141968 -55
Branches 16147 16124 -23
==========================================
+ Hits 101557 101621 +64
+ Misses 35998 35907 -91
+ Partials 4468 4440 -28
|
This PR adds functional tests for Explainability features. Namely, it tests the following scenarios:
Fixes #2573