-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Remove duplicate NormalizeFeatures from FFM trainer #2964
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| /// Whether to normalize the input vectors so that the concatenation of all fields' feature vectors is unit-length. | ||
| /// </summary> | ||
| [Argument(ArgumentType.AtMostOnce, HelpText = "Whether to normalize the input vectors so that the concatenation of all fields' feature vectors is unit-length", ShortName = "norm", SortOrder = 6)] | ||
| public bool Normalize = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Normalize [](start = 24, length = 9)
I would double check with @wschin #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wschin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait... @ganik discussed it with me offline but it seems I reached a wrong conclusion. Let me think about it again..
sure #Closed |
| _learningRate = options.LearningRate; | ||
| _numIterations = options.NumberOfIterations; | ||
| _norm = options.Normalize; | ||
| _norm = (options.NormalizeFeatures == NormalizeOption.Yes); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NormalizeFeatures [](start = 29, length = 17)
Just curious, why you decide not have NormalizeOption.Auto here? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By default normalization should be off as per Wei-Sheng
In reply to: 266086395 [](ancestors = 266086395)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ivanidzo4ka
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
![]()
|
Pls dont merge yet! #Closed |
Codecov Report
@@ Coverage Diff @@
## master #2964 +/- ##
==========================================
+ Coverage 72.29% 72.29% +<.01%
==========================================
Files 796 796
Lines 142349 142354 +5
Branches 16051 16052 +1
==========================================
+ Hits 102905 102911 +6
Misses 35063 35063
+ Partials 4381 4380 -1
|
Sure, I put WIP into PR title, feel free to take it out after you done. #Closed |
| // duplicate name skipped are always in the same correct order. | ||
| // Same name field can bubble up from base class even though | ||
| // its overidden / hidden, skip it. | ||
| if (collectedFields.Contains(name)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there are two Features columns and the second one is hidden, which one will be picked up here? I couldn't see any mechanism to filter out hidden fields. Do I miss something? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not about columns, this is about fields in a class. Did you mean fields? If yes, then pls read extensive comment i wrote on top
In reply to: 266176048 [](ancestors = 266176048)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry. Please replace columns with fields in my comment.
In reply to: 266176509 [](ancestors = 266176509,266176048)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, i got it, pls see the comment i wrote on top
In reply to: 266176749 [](ancestors = 266176749,266176509,266176048)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think this comment is clearer?
// The fields returned by GetFields are stably ordered starting from derived class' fields
// and ending up with ones from base classes. Therefore,
// unit tests to compare manifest are passing as long as the new field's type and name are identical to
// what it overrides. For the same reason,
// we can always skip the base filed overridden by another field introduced by "new".
In reply to: 266176749 [](ancestors = 266176749,266176509,266176048)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its not exactly correct. I am not sure that the order of fields are returned from derivatives and then for base classes. I didnt make such statement. Moreover in documentation for GetFields() its said that fields could be returned in any order. The only statements I am making: 1. Order is stable (meaning always the same order of fields) 2. For "currently duplicate" field (and its only one in our case NormalizeFields) order is "correct" with the fix, and that is the derivative is returned first and then the base. Note the difference, I do not state in general that derivative fields are returned first and then base one.
In reply to: 266179947 [](ancestors = 266179947,266176749,266176509,266176048)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
talked to Wei-Sheng offline, this looks good to him as it is. Thank you
In reply to: 266546642 [](ancestors = 266546642,266179947,266176749,266176509,266176048)
| /// </summary> | ||
| [Argument(ArgumentType.AtMostOnce, HelpText = "Whether to normalize the input vectors so that the concatenation of all fields' feature vectors is unit-length", ShortName = "norm", SortOrder = 6)] | ||
| public bool Normalize = true; | ||
| public new bool NormalizeFeatures = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to hide it? Could we set the underlying field to always be false and expose Normalize as is? I feel your previous solution is just what we need (almost). #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then there will be 2 Normalize* input fields mentioned in manifest.json and confusion over which one to use.
In reply to: 266176252 [](ancestors = 266176252)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No one cares about entry points this and next weeks, right? I roughly remember filtering out hidden fields is a bit difficult, so want to make your life easier..
In reply to: 266176427 [](ancestors = 266176427,266176252)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes its difficult to filter out hidden fields, but its currently not needed in this PR. It would be good to fix this as NimbusML depends on manifest.json. This duplication will propagate confusion in set of params for FFM class in NimbusML as well as docs.
In reply to: 266176837 [](ancestors = 266176837,266176427,266176252)
Fixes #2958. There are two flags controlling normalization steps right before and in FFM trainer. We decide to disable the former one because FFM has its own built-in normalization for multiple feature columns and the other normalization only works with a single feature column.