Skip to content

Conversation

@rogancarr
Copy link
Contributor

This PR adds functional tests for Explainability features. Namely, it tests the following scenarios:

  • I can get near-free (local) feature importance for scored examples (Feature Contributions)
  • I can view the overall importance of each feature (Permutation Feature Importance, GetFeatureWeights)
  • I can train interpretable models (linear model, GAM)
  • I can view how much each feature contributed to each prediction for trees and linear models (Feature Contributions)

Fixes #2573

}

/// <summary>
/// LocalFeatureImportance: Per-row feature importance can be computed through FeatureContributionCalculator for a linear model.
Copy link
Contributor Author

@rogancarr rogancarr Feb 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GAM #Resolved

}

/// <summary>
/// LocalFeatureImportance: Per-row feature importance can be computed through FeatureContributionCalculator for a linear model.
Copy link
Contributor Author

@rogancarr rogancarr Feb 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FastForest #Resolved

}

/// <summary>
/// LocalFeatureImportance: Per-row feature importance can be computed through FeatureContributionCalculator for a linear model.
Copy link
Contributor Author

@rogancarr rogancarr Feb 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FastTree #Resolved

var linearModel = model.LastTransformer.Model;

var weights = linearModel.Weights;

Copy link
Contributor

@artidoro artidoro Feb 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you commented the last step in all your examples, you could add a comment saying you are getting the weights and making sure there are the correct number of them. #Resolved

Copy link
Contributor

@artidoro artidoro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🕐

Copy link
Contributor

@artidoro artidoro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

namespace Microsoft.ML.Functional.Tests.Datasets
{
/// <summary>
/// A schematized class for loading the HousingRegression dataset.
Copy link
Contributor

@glebuk glebuk Feb 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A schematized class for loading the HousingRegression dataset. [](start = 7, length = 63)

Either class name or the comment is incorrect,..

var transformedData = model.Transform(data);

// Compute the permutation feature importance to look at global feature importance.
var permutationMetrics = mlContext.Regression.PermutationFeatureImportance(model.LastTransformer, transformedData);
Copy link
Contributor

@glebuk glebuk Feb 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

permutationMetrics [](start = 16, length = 18)

any way to know if the PFI are actually correct - meaining that important features were marked as such? Perhaps get a baseline or use generated dataset #Closed

Copy link
Contributor Author

@rogancarr rogancarr Feb 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests for PFI correctness are in Microsoft.ML.Tests. They validate that the importances produced by PFI are correct.

The purpose of Microsoft.ML.Functional.Tests is to guarantee that end-to-end scenarios work through public APIs, and that the results are returned in a way that make sense: That metrics objects are returned, that the individual metrics are in the allowable range for the metric, etc. In other words, these are not meant to be baseline or correctness tests.

That is to say, if you fix a numerical bug in ML.NET, these tests should not fail. But, if you change the output of an API, metrics start returning nonsensical values, or a scenario is no longer possible through public APIs, these tests should fail.


In reply to: 258589829 [](ancestors = 258589829)

Copy link
Contributor

@glebuk glebuk Feb 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test would fail to detect if the method will return a totally bogus positive value via an API bug.


In reply to: 259082766 [](ancestors = 259082766,258589829)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Offline conversation: We propose merging Functional tests and Baseline tests when we solve issue #2171, to move baseline tests off of the subcomponent infrastructure.


In reply to: 259084225 [](ancestors = 259084225,259082766,258589829)

{
var mlContext = new MLContext(seed: 1, conc: 1);

// Get the dataset
Copy link
Contributor

@glebuk glebuk Feb 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

. #Closed

var linearModel = model.LastTransformer.Model;

// Make sure the number of model weights returned matches the length of the input feature vector.
var weights = linearModel.Weights;
Copy link
Contributor

@glebuk glebuk Feb 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weights [](start = 38, length = 7)

validate that weights are reasonable with baseline or some other heuristic, not that they are just nonnegative #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above comment about FunctionalTests vs. Correctness tests and Baseline tests.


In reply to: 258590457 [](ancestors = 258590457)

treeModel.GetFeatureWeights(ref weights);

// Make sure the number of feature gains returned matches the length of the input feature vector.
Assert.Equal(HousingRegression.Features.Length, weights.Length);
Copy link
Contributor

@glebuk glebuk Feb 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Equal [](start = 19, length = 5)

same issue - insuffecient validation #Closed


// Fit the pipeline and transform the data.
var model = pipeline.Fit(data);
var scoredData = model.Transform(data);
Copy link
Contributor

@glebuk glebuk Feb 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems common to all tests. factor out. #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to do this, but unfortunately we no longer have a way to specify a generic model that can be used with the FeatureContributionCalculator, as we made all the interfaces internal.


In reply to: 258591290 [](ancestors = 258591290)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bummer


In reply to: 259094300 [](ancestors = 259094300,258591290)

var model = pipeline.Fit(data);
var scoredData = model.Transform(data);

// Create a Feature Contribution Calculator
Copy link
Contributor

@glebuk glebuk Feb 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calculator [](start = 45, length = 10)

more dots #Closed

var scoringEnumerator = mlContext.CreateEnumerable<FeatureContributionOutput>(shuffledSubset, true);

// Make sure the number of feature contributions returned matches the length of the input feature vector.
foreach (var row in scoringEnumerator)
Copy link
Contributor

@glebuk glebuk Feb 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

must. do. moar. validation. #Closed


// Validate that the contributions are there
var shuffledSubset = mlContext.Data.TakeRows(mlContext.Data.ShuffleRows(outputData), 10);
var scoringEnumerator = mlContext.CreateEnumerable<FeatureContributionOutput>(shuffledSubset, true);
Copy link
Contributor

@glebuk glebuk Feb 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

validate results #Closed

// Compute the contributions
var outputData = featureContributions.Fit(scoredData).Transform(scoredData);

// Validate that the contributions are there
Copy link
Contributor

@glebuk glebuk Feb 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Validate that the contributions are there [](start = 15, length = 41)

make sure the contributions are correct, not just right sign #Closed

Copy link
Contributor

@glebuk glebuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@codecov
Copy link

codecov bot commented Feb 21, 2019

Codecov Report

Merging #2584 into master will increase coverage by 0.07%.
The diff coverage is 91.72%.

@@            Coverage Diff             @@
##           master    #2584      +/-   ##
==========================================
+ Coverage    71.5%   71.58%   +0.07%     
==========================================
  Files         801      803       +2     
  Lines      142023   141968      -55     
  Branches    16147    16124      -23     
==========================================
+ Hits       101557   101621      +64     
+ Misses      35998    35907      -91     
+ Partials     4468     4440      -28
Flag Coverage Δ
#Debug 71.58% <91.72%> (+0.07%) ⬆️
#production 67.87% <ø> (+0.07%) ⬆️
#test 85.73% <91.72%> (+0.16%) ⬆️

1 similar comment
@codecov
Copy link

codecov bot commented Feb 21, 2019

Codecov Report

Merging #2584 into master will increase coverage by 0.07%.
The diff coverage is 91.72%.

@@            Coverage Diff             @@
##           master    #2584      +/-   ##
==========================================
+ Coverage    71.5%   71.58%   +0.07%     
==========================================
  Files         801      803       +2     
  Lines      142023   141968      -55     
  Branches    16147    16124      -23     
==========================================
+ Hits       101557   101621      +64     
+ Misses      35998    35907      -91     
+ Partials     4468     4440      -28
Flag Coverage Δ
#Debug 71.58% <91.72%> (+0.07%) ⬆️
#production 67.87% <ø> (+0.07%) ⬆️
#test 85.73% <91.72%> (+0.16%) ⬆️

@rogancarr rogancarr merged commit 512493a into dotnet:master Feb 21, 2019
@rogancarr rogancarr deleted the 2573_explainability_scenarios branch February 21, 2019 21:29
@ghost ghost locked as resolved and limited conversation to collaborators Mar 24, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants