Remove duplicate NormalizeFeatures from FFM trainer #2964

ganik · 2019-03-14T23:11:35Z

Fixes #2958. There are two flags controlling normalization steps right before and in FFM trainer. We decide to disable the former one because FFM has its own built-in normalization for multiple feature columns and the other normalization only works with a single feature column.

Ivanidzo4ka · 2019-03-14T23:14:31Z

src/Microsoft.ML.StandardTrainers/FactorizationMachine/FactorizationMachineTrainer.cs

-            /// Whether to normalize the input vectors so that the concatenation of all fields' feature vectors is unit-length.
-            /// </summary>
-            [Argument(ArgumentType.AtMostOnce, HelpText = "Whether to normalize the input vectors so that the concatenation of all fields' feature vectors is unit-length", ShortName = "norm", SortOrder = 6)]
-            public bool Normalize = true;


Normalize [](start = 24, length = 9)

I would double check with @wschin #Resolved

yep, did

In reply to: 265801014 [](ancestors = 265801014)

wschin

Wait... @ganik discussed it with me offline but it seems I reached a wrong conclusion. Let me think about it again..

ganik · 2019-03-14T23:53:29Z

Wait... @ganik discussed it with me offline but it seems I reached a wrong conclusion. Let me think about it again..

sure #Closed

Ivanidzo4ka · 2019-03-15T17:47:58Z

src/Microsoft.ML.StandardTrainers/FactorizationMachine/FactorizationMachineTrainer.cs

            _learningRate = options.LearningRate;
            _numIterations = options.NumberOfIterations;
-            _norm = options.Normalize;
+            _norm = (options.NormalizeFeatures == NormalizeOption.Yes);


NormalizeFeatures [](start = 29, length = 17)

Just curious, why you decide not have NormalizeOption.Auto here? #Resolved

By default normalization should be off as per Wei-Sheng

In reply to: 266086395 [](ancestors = 266086395)

public bool Normalize = true;
so this was mistake which we fixing.
Ok, if @wschin says so.

In reply to: 266129940 [](ancestors = 266129940,266086395)

Ivanidzo4ka

ganik · 2019-03-15T23:09:04Z

Pls dont merge yet! #Closed

codecov · 2019-03-15T23:18:06Z

Codecov Report

Merging #2964 into master will increase coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master    #2964      +/-   ##
==========================================
+ Coverage   72.29%   72.29%   +<.01%     
==========================================
  Files         796      796              
  Lines      142349   142354       +5     
  Branches    16051    16052       +1     
==========================================
+ Hits       102905   102911       +6     
  Misses      35063    35063              
+ Partials     4381     4380       -1

Flag	Coverage Δ
#Debug	`72.29% <100%> (ø)`	⬆️
#production	`68.01% <100%> (ø)`	⬆️
#test	`88.48% <ø> (ø)`	⬆️

Impacted Files	Coverage Δ
...soft.ML.EntryPoints/JsonUtils/JsonManifestUtils.cs	`83.66% <100%> (+0.23%)`	⬆️
...actorizationMachine/FactorizationMachineTrainer.cs	`88.47% <100%> (ø)`	⬆️
src/Microsoft.ML.Transforms/CategoricalCatalog.cs	`100% <0%> (ø)`	⬆️
src/Microsoft.ML.Data/Training/TrainerInputBase.cs	`100% <0%> (ø)`	⬆️
...icrosoft.ML.FastTree/RandomForestClassification.cs	`78.07% <0%> (ø)`	⬆️
...soft.ML.FastTree/Training/EarlyStoppingCriteria.cs	`71.73% <0%> (ø)`	⬆️
src/Microsoft.ML.FastTree/TreeTrainersCatalog.cs	`94.18% <0%> (ø)`	⬆️
src/Microsoft.ML.FastTree/FastTreeRanking.cs	`48.19% <0%> (ø)`	⬆️
src/Microsoft.ML.FastTree/FastTreeTweedie.cs	`56.29% <0%> (ø)`	⬆️
src/Microsoft.ML.FastTree/FastTreeRegression.cs	`54.5% <0%> (ø)`	⬆️
... and 8 more
#Closed

Ivanidzo4ka · 2019-03-15T23:36:46Z

Pls dont merge yet!

Sure, I put WIP into PR title, feel free to take it out after you done. #Closed

wschin · 2019-03-15T23:41:46Z

src/Microsoft.ML.EntryPoints/JsonUtils/JsonManifestUtils.cs

+                // duplicate name skipped are always in the same correct order.
+                // Same name field can bubble up from base class even though
+                // its overidden / hidden, skip it.
+                if (collectedFields.Contains(name))


If there are two Features columns and the second one is hidden, which one will be picked up here? I couldn't see any mechanism to filter out hidden fields. Do I miss something? #Resolved

this is not about columns, this is about fields in a class. Did you mean fields? If yes, then pls read extensive comment i wrote on top

In reply to: 266176048 [](ancestors = 266176048)

Sorry. Please replace columns with fields in my comment.

In reply to: 266176509 [](ancestors = 266176509,266176048)

yes, i got it, pls see the comment i wrote on top

In reply to: 266176749 [](ancestors = 266176749,266176509,266176048)

Do you think this comment is clearer?

// The fields returned by GetFields are stably ordered starting from derived class' fields // and ending up with ones from base classes. Therefore, // unit tests to compare manifest are passing as long as the new field's type and name are identical to // what it overrides. For the same reason, // we can always skip the base filed overridden by another field introduced by "new".

In reply to: 266176749 [](ancestors = 266176749,266176509,266176048)

Its not exactly correct. I am not sure that the order of fields are returned from derivatives and then for base classes. I didnt make such statement. Moreover in documentation for GetFields() its said that fields could be returned in any order. The only statements I am making: 1. Order is stable (meaning always the same order of fields) 2. For "currently duplicate" field (and its only one in our case NormalizeFields) order is "correct" with the fix, and that is the derivative is returned first and then the base. Note the difference, I do not state in general that derivative fields are returned first and then base one.

In reply to: 266179947 [](ancestors = 266179947,266176749,266176509,266176048)

talked to Wei-Sheng offline, this looks good to him as it is. Thank you

In reply to: 266546642 [](ancestors = 266546642,266179947,266176749,266176509,266176048)

wschin · 2019-03-15T23:43:35Z

src/Microsoft.ML.StandardTrainers/FactorizationMachine/FactorizationMachineTrainer.cs

            /// </summary>
            [Argument(ArgumentType.AtMostOnce, HelpText = "Whether to normalize the input vectors so that the concatenation of all fields' feature vectors is unit-length", ShortName = "norm", SortOrder = 6)]
-            public bool Normalize = true;
+            public new bool NormalizeFeatures = true;


Why do we need to hide it? Could we set the underlying field to always be false and expose Normalize as is? I feel your previous solution is just what we need (almost). #Closed

then there will be 2 Normalize* input fields mentioned in manifest.json and confusion over which one to use.

In reply to: 266176252 [](ancestors = 266176252)

No one cares about entry points this and next weeks, right? I roughly remember filtering out hidden fields is a bit difficult, so want to make your life easier..

In reply to: 266176427 [](ancestors = 266176427,266176252)

yes its difficult to filter out hidden fields, but its currently not needed in this PR. It would be good to fix this as NimbusML depends on manifest.json. This duplication will propagate confusion in set of params for FFM class in NimbusML as well as docs.

In reply to: 266176837 [](ancestors = 266176837,266176427,266176252)

Remove duplicate NormalizeFeatures from FFM trainer

72c8426

Ivanidzo4ka reviewed Mar 14, 2019

View reviewed changes

ganik requested a review from wschin March 14, 2019 23:14

add comment

aa8b304

wschin approved these changes Mar 14, 2019

View reviewed changes

wschin reviewed Mar 14, 2019

View reviewed changes

ganik added 2 commits March 14, 2019 17:04

set FFM feature normalization default to false

cc89432

by default normalization is off for FFM

fd4ceec

Ivanidzo4ka reviewed Mar 15, 2019

View reviewed changes

Ivanidzo4ka approved these changes Mar 15, 2019

View reviewed changes

ganik added 2 commits March 15, 2019 13:51

Hide base class NormalizeFeatures field

664f225

hide base option class NormalizeField for FFM Options

9ac69ba

ganik requested a review from TomFinley March 15, 2019 22:43

Ivanidzo4ka changed the title ~~Remove duplicate NormalizeFeatures from FFM trainer~~ WIP Remove duplicate NormalizeFeatures from FFM trainer Mar 15, 2019

wschin reviewed Mar 15, 2019

View reviewed changes

ganik changed the title ~~WIP Remove duplicate NormalizeFeatures from FFM trainer~~ Remove duplicate NormalizeFeatures from FFM trainer Mar 18, 2019

ganik merged commit 0835c52 into dotnet:master Mar 18, 2019

ghost locked as resolved and limited conversation to collaborators Mar 23, 2022

Remove duplicate NormalizeFeatures from FFM trainer #2964

Remove duplicate NormalizeFeatures from FFM trainer #2964

Uh oh!

Conversation

ganik commented Mar 14, 2019 • edited by wschin Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Ivanidzo4ka Mar 14, 2019 • edited by ganik Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wschin left a comment

Choose a reason for hiding this comment

Uh oh!

ganik commented Mar 14, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Ivanidzo4ka Mar 15, 2019 • edited by ganik Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Ivanidzo4ka left a comment

Choose a reason for hiding this comment

Uh oh!

ganik commented Mar 15, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Mar 15, 2019 • edited by ganik Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Ivanidzo4ka commented Mar 15, 2019 • edited by ganik Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wschin Mar 15, 2019 • edited by ganik Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ganik Mar 15, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wschin Mar 16, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wschin Mar 15, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ganik Mar 15, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wschin Mar 15, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

ganik commented Mar 14, 2019 •

edited by wschin

Loading

Ivanidzo4ka Mar 14, 2019 •

edited by ganik

Loading

ganik commented Mar 14, 2019 •

edited

Loading

Ivanidzo4ka Mar 15, 2019 •

edited by ganik

Loading

ganik commented Mar 15, 2019 •

edited

Loading

codecov bot commented Mar 15, 2019 •

edited by ganik

Loading

Ivanidzo4ka commented Mar 15, 2019 •

edited by ganik

Loading

wschin Mar 15, 2019 •

edited by ganik

Loading

ganik Mar 15, 2019 •

edited

Loading

wschin Mar 16, 2019 •

edited

Loading

wschin Mar 15, 2019 •

edited

Loading

ganik Mar 15, 2019 •

edited

Loading

wschin Mar 15, 2019 •

edited

Loading