-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Fix for trainer estimator metadata propagation #909
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| /// <summary> | ||
| /// Normal metadata that we produce for score columns. | ||
| /// </summary> | ||
| protected static IEnumerable<SchemaShape.Column> MetadataForScoreColumn(bool isNormalized = false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
protected static IEnumerable<SchemaShape.Column> MetadataForScoreColu [](start = 8, length = 69)
should this live under some utilities class, since we'll need to replicate for the trainers that don't extend this base class? #Resolved
| /// <summary> | ||
| /// Normal metadata that we produce for score columns. | ||
| /// </summary> | ||
| protected static IEnumerable<SchemaShape.Column> MetadataForScoreColumn(bool isNormalized = false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isNormalized [](start = 85, length = 12)
will this come from the trainerInfo, or it is always true for the Propability type column? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| new SchemaShape.Column(DefaultColumnNames.PredictedLabel, SchemaShape.Column.VectorKind.Scalar, BoolType.Instance, false) | ||
| new SchemaShape.Column(DefaultColumnNames.Score, SchemaShape.Column.VectorKind.Scalar, NumberType.R4, false, new SchemaShape(MetadataForScoreColumn())), | ||
| new SchemaShape.Column(DefaultColumnNames.Probability, SchemaShape.Column.VectorKind.Scalar, NumberType.R4, false, new SchemaShape(MetadataForScoreColumn(true))), | ||
| new SchemaShape.Column(DefaultColumnNames.PredictedLabel, SchemaShape.Column.VectorKind.Scalar, BoolType.Instance, false, new SchemaShape(MetadataForScoreColumn())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shape.Column(DefaultColumnNames.PredictedLabel [](start = 26, length = 46)
is it intentional to not use also the input label metadata? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well, the input label metadata is not passed through. I'm merely trying to codify existing behavior of the trainer, which has not been codified before
In reply to: 217574282 [](ancestors = 217574282)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious: why do you concat to the label metadata of SDCA multiclass, then? To propagate the info about the string - categories conversion?
In reply to: 217769414 [](ancestors = 217769414,217574282)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SDCA propagates key value metadata, that's what I'm doing there. And I think all the multiclass learners do (or should do)
In reply to: 217781701 [](ancestors = 217781701,217769414,217574282)
| new SchemaShape.Column(DefaultColumnNames.PredictedLabel, SchemaShape.Column.VectorKind.Scalar, BoolType.Instance, false) | ||
| new SchemaShape.Column(DefaultColumnNames.Score, SchemaShape.Column.VectorKind.Scalar, NumberType.R4, false, new SchemaShape(MetadataForScoreColumn())), | ||
| new SchemaShape.Column(DefaultColumnNames.Probability, SchemaShape.Column.VectorKind.Scalar, NumberType.R4, false, new SchemaShape(MetadataForScoreColumn(true))), | ||
| new SchemaShape.Column(DefaultColumnNames.PredictedLabel, SchemaShape.Column.VectorKind.Scalar, BoolType.Instance, false, new SchemaShape(MetadataForScoreColumn())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MetadataForScoreColumn [](start = 154, length = 22)
shoudl this just be called Metadata, since it is used for all output columns? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| cols.Add(new SchemaShape.Column(MetadataUtils.Kinds.ScoreColumnKind, SchemaShape.Column.VectorKind.Scalar, TextType.Instance, false)); | ||
| cols.Add(new SchemaShape.Column(MetadataUtils.Kinds.ScoreValueKind, SchemaShape.Column.VectorKind.Scalar, TextType.Instance, false)); | ||
| if (isNormalized) | ||
| cols.Add(new SchemaShape.Column(MetadataUtils.Kinds.IsNormalized, SchemaShape.Column.VectorKind.Scalar, BoolType.Instance, false)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Score column generated from the OVA tests, i see yet another column: "Slot Names"
Dees that need to be added to the list of metadata here? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sfilipi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
![]()
Added tests for metadata propagation on existing trainers, also fixed SDCA to pass metadata correctly.