-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Tree based trainers implement ICanGetSummaryAsIDataView #3892
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| /// <see cref="TrainedTreeEnsemble"/>. | ||
| /// </summary> | ||
| public abstract class TreeEnsembleModelParametersBasedOnRegressionTree : TreeEnsembleModelParameters | ||
| public abstract class TreeEnsembleModelParametersBasedOnRegressionTree : TreeEnsembleModelParameters, ICanGetSummaryAsIDataView |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need ICanGetSummaryAs...? IDataView RegressionTreeEnsembleAsIDataView(..) looks sufficient for generating a summary. #ByDesign
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's the interface that Summarize entrypoint uses to get the IDataView from a trainer.
See:
| public static CommonOutputs.SummaryOutput Summarize(IHostEnvironment env, SummarizePredictor.Input input) |
And:
| internal static IDataView GetSummaryAndStats(IHostEnvironment env, IPredictor predictor, RoleMappedSchema schema, out IDataView stats) |
81a5d39 to
9b0e097
Compare
| /// Used for the Summarize entrypoint. | ||
| /// </summary> | ||
| IDataView ICanGetSummaryAsIDataView.GetSummaryDataView(RoleMappedSchema schema) | ||
| => RegressionTreeBaseUtils.RegressionTreeEnsembleAsIDataView(Host, TrainedTreeEnsemble.Bias, TrainedTreeEnsemble.TreeWeights, TrainedTreeEnsemble.Trees); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indent
| [LightGBMFact] | ||
| public void LightGbmBinaryClassificationTestSummary() | ||
| { | ||
| var (pipeline, dataView) = GetBinaryClassificationPipeline(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GetBinaryClassificationPipeline [](start = 39, length = 31)
Make sure that these work with the categorical split after one hot encoding.
codemzs
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We reviewed this in person and mostly looks good. Signing off assuming you will be fix the output of the summary to not display negative values for node ids and instead use 2s complement so that it is easier to understand the tree structure.
d6bf2c4 to
78f4f86
Compare
Fixes #3755.
NimbusML did not have access to the details on the tree structure.
This PR implements the
ICanGetSummaryAsIDataViewinterface which is used in theSummarizeentrypoint to pass a summary of the model parameters to NimbusML in the form of anIDataView.I create a utility method that does the conversion from
RegressionTreeBasetoIDataViewwith special treatment forQuantileRegressionTreeswhich have additional information.In the
IDataView, each node has its own row and the columns correspond to the fields describing each node. To determine which tree the node belongs to there is aTreeIDcolumn.