Adding functional tests for explainability #2584

rogancarr · 2019-02-15T23:19:12Z

This PR adds functional tests for Explainability features. Namely, it tests the following scenarios:

I can get near-free (local) feature importance for scored examples (Feature Contributions)
I can view the overall importance of each feature (Permutation Feature Importance, GetFeatureWeights)
I can train interpretable models (linear model, GAM)
I can view how much each feature contributed to each prediction for trees and linear models (Feature Contributions)

rogancarr · 2019-02-15T23:20:11Z

test/Microsoft.ML.Functional.Tests/Explainability.cs

+        }
+
+        /// <summary>
+        /// LocalFeatureImportance: Per-row feature importance can be computed through FeatureContributionCalculator for a linear model.


GAM #Resolved

rogancarr · 2019-02-15T23:20:19Z

test/Microsoft.ML.Functional.Tests/Explainability.cs

+        }
+
+        /// <summary>
+        /// LocalFeatureImportance: Per-row feature importance can be computed through FeatureContributionCalculator for a linear model.


FastForest #Resolved

rogancarr · 2019-02-15T23:20:27Z

test/Microsoft.ML.Functional.Tests/Explainability.cs

+        }
+
+        /// <summary>
+        /// LocalFeatureImportance: Per-row feature importance can be computed through FeatureContributionCalculator for a linear model.


FastTree #Resolved

artidoro · 2019-02-19T19:13:11Z

test/Microsoft.ML.Functional.Tests/Explainability.cs

+            var linearModel = model.LastTransformer.Model;
+
+            var weights = linearModel.Weights;
+


nit: you commented the last step in all your examples, you could add a comment saying you are getting the weights and making sure there are the correct number of them. #Resolved

artidoro

🕐

artidoro

glebuk · 2019-02-20T17:15:34Z

test/Microsoft.ML.Functional.Tests/Datasets/FeatureContributionOutput.cs

+namespace Microsoft.ML.Functional.Tests.Datasets
+{
+    /// <summary>
+    /// A schematized class for loading the HousingRegression dataset.


A schematized class for loading the HousingRegression dataset. [](start = 7, length = 63)

Either class name or the comment is incorrect,..

glebuk · 2019-02-20T17:19:34Z

test/Microsoft.ML.Functional.Tests/Explainability.cs

+            var transformedData = model.Transform(data);
+
+            // Compute the permutation feature importance to look at global feature importance.
+            var permutationMetrics = mlContext.Regression.PermutationFeatureImportance(model.LastTransformer, transformedData);


permutationMetrics [](start = 16, length = 18)

any way to know if the PFI are actually correct - meaining that important features were marked as such? Perhaps get a baseline or use generated dataset #Closed

The tests for PFI correctness are in Microsoft.ML.Tests. They validate that the importances produced by PFI are correct.

The purpose of Microsoft.ML.Functional.Tests is to guarantee that end-to-end scenarios work through public APIs, and that the results are returned in a way that make sense: That metrics objects are returned, that the individual metrics are in the allowable range for the metric, etc. In other words, these are not meant to be baseline or correctness tests.

That is to say, if you fix a numerical bug in ML.NET, these tests should not fail. But, if you change the output of an API, metrics start returning nonsensical values, or a scenario is no longer possible through public APIs, these tests should fail.

In reply to: 258589829 [](ancestors = 258589829)

This test would fail to detect if the method will return a totally bogus positive value via an API bug.

In reply to: 259082766 [](ancestors = 259082766,258589829)

Offline conversation: We propose merging Functional tests and Baseline tests when we solve issue #2171, to move baseline tests off of the subcomponent infrastructure.

In reply to: 259084225 [](ancestors = 259084225,259082766,258589829)

glebuk · 2019-02-20T17:20:16Z

test/Microsoft.ML.Functional.Tests/Explainability.cs

+        {
+            var mlContext = new MLContext(seed: 1, conc: 1);
+
+            // Get the dataset


glebuk · 2019-02-20T17:21:04Z

test/Microsoft.ML.Functional.Tests/Explainability.cs

+            var linearModel = model.LastTransformer.Model;
+
+            // Make sure the number of model weights returned matches the length of the input feature vector.
+            var weights = linearModel.Weights;


Weights [](start = 38, length = 7)

validate that weights are reasonable with baseline or some other heuristic, not that they are just nonnegative #Closed

See above comment about FunctionalTests vs. Correctness tests and Baseline tests.

In reply to: 258590457 [](ancestors = 258590457)

glebuk · 2019-02-20T17:22:33Z

test/Microsoft.ML.Functional.Tests/Explainability.cs

+            treeModel.GetFeatureWeights(ref weights);
+
+            // Make sure the number of feature gains returned matches the length of the input feature vector.
+            Assert.Equal(HousingRegression.Features.Length, weights.Length);


Equal [](start = 19, length = 5)

same issue - insuffecient validation #Closed

glebuk · 2019-02-20T17:23:04Z

test/Microsoft.ML.Functional.Tests/Explainability.cs

+
+            // Fit the pipeline and transform the data.
+            var model = pipeline.Fit(data);
+            var scoredData = model.Transform(data);


this seems common to all tests. factor out. #Closed

I tried to do this, but unfortunately we no longer have a way to specify a generic model that can be used with the FeatureContributionCalculator, as we made all the interfaces internal.

In reply to: 258591290 [](ancestors = 258591290)

bummer

In reply to: 259094300 [](ancestors = 259094300,258591290)

glebuk · 2019-02-20T17:23:26Z

test/Microsoft.ML.Functional.Tests/Explainability.cs

+            var model = pipeline.Fit(data);
+            var scoredData = model.Transform(data);
+
+            // Create a Feature Contribution Calculator


Calculator [](start = 45, length = 10)

more dots #Closed

glebuk · 2019-02-20T17:24:04Z

test/Microsoft.ML.Functional.Tests/Explainability.cs

+            var scoringEnumerator = mlContext.CreateEnumerable<FeatureContributionOutput>(shuffledSubset, true);
+
+            // Make sure the number of feature contributions returned matches the length of the input feature vector.
+            foreach (var row in scoringEnumerator)


must. do. moar. validation. #Closed

glebuk · 2019-02-20T17:25:05Z

test/Microsoft.ML.Functional.Tests/Explainability.cs

+
+            // Validate that the contributions are there
+            var shuffledSubset = mlContext.Data.TakeRows(mlContext.Data.ShuffleRows(outputData), 10);
+            var scoringEnumerator = mlContext.CreateEnumerable<FeatureContributionOutput>(shuffledSubset, true);


validate results #Closed

glebuk · 2019-02-20T17:25:29Z

test/Microsoft.ML.Functional.Tests/Explainability.cs

+            // Compute the contributions
+            var outputData = featureContributions.Fit(scoredData).Transform(scoredData);
+
+            // Validate that the contributions are there


Validate that the contributions are there [](start = 15, length = 41)

make sure the contributions are correct, not just right sign #Closed

glebuk

codecov · 2019-02-21T21:02:28Z

Codecov Report

Merging #2584 into master will increase coverage by 0.07%.
The diff coverage is 91.72%.

@@            Coverage Diff             @@
##           master    #2584      +/-   ##
==========================================
+ Coverage    71.5%   71.58%   +0.07%     
==========================================
  Files         801      803       +2     
  Lines      142023   141968      -55     
  Branches    16147    16124      -23     
==========================================
+ Hits       101557   101621      +64     
+ Misses      35998    35907      -91     
+ Partials     4468     4440      -28

Flag	Coverage Δ
#Debug	`71.58% <91.72%> (+0.07%)`	⬆️
#production	`67.87% <ø> (+0.07%)`	⬆️
#test	`85.73% <91.72%> (+0.16%)`	⬆️

codecov · 2019-02-21T21:02:39Z

Codecov Report

Merging #2584 into master will increase coverage by 0.07%.
The diff coverage is 91.72%.

@@            Coverage Diff             @@
##           master    #2584      +/-   ##
==========================================
+ Coverage    71.5%   71.58%   +0.07%     
==========================================
  Files         801      803       +2     
  Lines      142023   141968      -55     
  Branches    16147    16124      -23     
==========================================
+ Hits       101557   101621      +64     
+ Misses      35998    35907      -91     
+ Partials     4468     4440      -28

Flag	Coverage Δ
#Debug	`71.58% <91.72%> (+0.07%)`	⬆️
#production	`67.87% <ø> (+0.07%)`	⬆️
#test	`85.73% <91.72%> (+0.16%)`	⬆️

Adding functional tests for explainability

14c7c17

rogancarr commented Feb 15, 2019

View reviewed changes

Addressing PR comments.

93c85e3

rogancarr requested review from Ivanidzo4ka and artidoro February 15, 2019 23:23

rogancarr mentioned this pull request Feb 19, 2019

Feature importance for LightGBM #576

Closed

artidoro reviewed Feb 19, 2019

View reviewed changes

artidoro suggested changes Feb 19, 2019

View reviewed changes

artidoro approved these changes Feb 19, 2019

View reviewed changes

Addressing PR comments

242759a

rogancarr mentioned this pull request Feb 19, 2019

V1 Scenarios need to be covered by tests #2498

Open

glebuk reviewed Feb 20, 2019

View reviewed changes

Addressing PR comments.

ceae992

glebuk approved these changes Feb 21, 2019

View reviewed changes

rogancarr merged commit 512493a into dotnet:master Feb 21, 2019

rogancarr deleted the 2573_explainability_scenarios branch February 21, 2019 21:29

ghost locked as resolved and limited conversation to collaborators Mar 24, 2022

		var linearModel = model.LastTransformer.Model;

		var weights = linearModel.Weights;

Adding functional tests for explainability #2584

Adding functional tests for explainability #2584

Uh oh!

Conversation

rogancarr commented Feb 15, 2019

Uh oh!

rogancarr Feb 15, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rogancarr Feb 15, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rogancarr Feb 15, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

artidoro Feb 19, 2019 • edited by rogancarr Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

artidoro left a comment

Choose a reason for hiding this comment

Uh oh!

artidoro left a comment

Choose a reason for hiding this comment

Uh oh!

glebuk Feb 20, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

glebuk Feb 20, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rogancarr Feb 21, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

glebuk Feb 21, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rogancarr Feb 21, 2019

Choose a reason for hiding this comment

Uh oh!

glebuk Feb 20, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

glebuk Feb 20, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rogancarr Feb 21, 2019

Choose a reason for hiding this comment

Uh oh!

glebuk Feb 20, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

glebuk Feb 20, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rogancarr Feb 21, 2019

Choose a reason for hiding this comment

Uh oh!

glebuk Feb 21, 2019

Choose a reason for hiding this comment

Uh oh!

glebuk Feb 20, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

glebuk Feb 20, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rogancarr Feb 15, 2019 •

edited

Loading

rogancarr Feb 15, 2019 •

edited

Loading

rogancarr Feb 15, 2019 •

edited

Loading

artidoro Feb 19, 2019 •

edited by rogancarr

Loading

glebuk Feb 20, 2019 •

edited

Loading

glebuk Feb 20, 2019 •

edited

Loading

rogancarr Feb 21, 2019 •

edited

Loading

glebuk Feb 21, 2019 •

edited

Loading

glebuk Feb 20, 2019 •

edited

Loading

glebuk Feb 20, 2019 •

edited

Loading

glebuk Feb 20, 2019 •

edited

Loading

glebuk Feb 20, 2019 •

edited

Loading

glebuk Feb 20, 2019 •

edited

Loading

glebuk Feb 20, 2019 •

edited

Loading

glebuk Feb 20, 2019 •

edited

Loading

glebuk Feb 20, 2019 •

edited

Loading