Skip to content

Conversation

@ganik
Copy link
Member

@ganik ganik commented Mar 14, 2019

Fixes #2958. There are two flags controlling normalization steps right before and in FFM trainer. We decide to disable the former one because FFM has its own built-in normalization for multiple feature columns and the other normalization only works with a single feature column.

/// Whether to normalize the input vectors so that the concatenation of all fields' feature vectors is unit-length.
/// </summary>
[Argument(ArgumentType.AtMostOnce, HelpText = "Whether to normalize the input vectors so that the concatenation of all fields' feature vectors is unit-length", ShortName = "norm", SortOrder = 6)]
public bool Normalize = true;
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Mar 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normalize [](start = 24, length = 9)

I would double check with @wschin #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, did


In reply to: 265801014 [](ancestors = 265801014)

@ganik ganik requested a review from wschin March 14, 2019 23:14
Copy link
Member

@wschin wschin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait... @ganik discussed it with me offline but it seems I reached a wrong conclusion. Let me think about it again..

@ganik
Copy link
Member Author

ganik commented Mar 14, 2019

Wait... @ganik discussed it with me offline but it seems I reached a wrong conclusion. Let me think about it again..

sure #Closed

_learningRate = options.LearningRate;
_numIterations = options.NumberOfIterations;
_norm = options.Normalize;
_norm = (options.NormalizeFeatures == NormalizeOption.Yes);
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Mar 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NormalizeFeatures [](start = 29, length = 17)

Just curious, why you decide not have NormalizeOption.Auto here? #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By default normalization should be off as per Wei-Sheng


In reply to: 266086395 [](ancestors = 266086395)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

public bool Normalize = true;
so this was mistake which we fixing.
Ok, if @wschin says so.


In reply to: 266129940 [](ancestors = 266129940,266086395)

Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@ganik ganik requested a review from TomFinley March 15, 2019 22:43
@ganik
Copy link
Member Author

ganik commented Mar 15, 2019

Pls dont merge yet! #Closed

@codecov
Copy link

codecov bot commented Mar 15, 2019

Codecov Report

Merging #2964 into master will increase coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master    #2964      +/-   ##
==========================================
+ Coverage   72.29%   72.29%   +<.01%     
==========================================
  Files         796      796              
  Lines      142349   142354       +5     
  Branches    16051    16052       +1     
==========================================
+ Hits       102905   102911       +6     
  Misses      35063    35063              
+ Partials     4381     4380       -1
Flag Coverage Δ
#Debug 72.29% <100%> (ø) ⬆️
#production 68.01% <100%> (ø) ⬆️
#test 88.48% <ø> (ø) ⬆️
Impacted Files Coverage Δ
...soft.ML.EntryPoints/JsonUtils/JsonManifestUtils.cs 83.66% <100%> (+0.23%) ⬆️
...actorizationMachine/FactorizationMachineTrainer.cs 88.47% <100%> (ø) ⬆️
src/Microsoft.ML.Transforms/CategoricalCatalog.cs 100% <0%> (ø) ⬆️
src/Microsoft.ML.Data/Training/TrainerInputBase.cs 100% <0%> (ø) ⬆️
...icrosoft.ML.FastTree/RandomForestClassification.cs 78.07% <0%> (ø) ⬆️
...soft.ML.FastTree/Training/EarlyStoppingCriteria.cs 71.73% <0%> (ø) ⬆️
src/Microsoft.ML.FastTree/TreeTrainersCatalog.cs 94.18% <0%> (ø) ⬆️
src/Microsoft.ML.FastTree/FastTreeRanking.cs 48.19% <0%> (ø) ⬆️
src/Microsoft.ML.FastTree/FastTreeTweedie.cs 56.29% <0%> (ø) ⬆️
src/Microsoft.ML.FastTree/FastTreeRegression.cs 54.5% <0%> (ø) ⬆️
... and 8 more
#Closed

@Ivanidzo4ka Ivanidzo4ka changed the title Remove duplicate NormalizeFeatures from FFM trainer WIP Remove duplicate NormalizeFeatures from FFM trainer Mar 15, 2019
@Ivanidzo4ka
Copy link
Contributor

Ivanidzo4ka commented Mar 15, 2019

Pls dont merge yet!

Sure, I put WIP into PR title, feel free to take it out after you done. #Closed

// duplicate name skipped are always in the same correct order.
// Same name field can bubble up from base class even though
// its overidden / hidden, skip it.
if (collectedFields.Contains(name))
Copy link
Member

@wschin wschin Mar 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there are two Features columns and the second one is hidden, which one will be picked up here? I couldn't see any mechanism to filter out hidden fields. Do I miss something? #Resolved

Copy link
Member Author

@ganik ganik Mar 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not about columns, this is about fields in a class. Did you mean fields? If yes, then pls read extensive comment i wrote on top


In reply to: 266176048 [](ancestors = 266176048)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry. Please replace columns with fields in my comment.


In reply to: 266176509 [](ancestors = 266176509,266176048)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, i got it, pls see the comment i wrote on top


In reply to: 266176749 [](ancestors = 266176749,266176509,266176048)

Copy link
Member

@wschin wschin Mar 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think this comment is clearer?

                // The fields returned by GetFields are stably ordered starting from derived class' fields
                // and ending up with ones from base classes. Therefore,
                // unit tests to compare manifest are passing as long as the new field's type and name are identical to
                // what it overrides. For the same reason,
                // we can always skip the base filed overridden by another field introduced by "new".

In reply to: 266176749 [](ancestors = 266176749,266176509,266176048)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its not exactly correct. I am not sure that the order of fields are returned from derivatives and then for base classes. I didnt make such statement. Moreover in documentation for GetFields() its said that fields could be returned in any order. The only statements I am making: 1. Order is stable (meaning always the same order of fields) 2. For "currently duplicate" field (and its only one in our case NormalizeFields) order is "correct" with the fix, and that is the derivative is returned first and then the base. Note the difference, I do not state in general that derivative fields are returned first and then base one.


In reply to: 266179947 [](ancestors = 266179947,266176749,266176509,266176048)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

talked to Wei-Sheng offline, this looks good to him as it is. Thank you


In reply to: 266546642 [](ancestors = 266546642,266179947,266176749,266176509,266176048)

/// </summary>
[Argument(ArgumentType.AtMostOnce, HelpText = "Whether to normalize the input vectors so that the concatenation of all fields' feature vectors is unit-length", ShortName = "norm", SortOrder = 6)]
public bool Normalize = true;
public new bool NormalizeFeatures = true;
Copy link
Member

@wschin wschin Mar 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to hide it? Could we set the underlying field to always be false and expose Normalize as is? I feel your previous solution is just what we need (almost). #Closed

Copy link
Member Author

@ganik ganik Mar 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then there will be 2 Normalize* input fields mentioned in manifest.json and confusion over which one to use.


In reply to: 266176252 [](ancestors = 266176252)

Copy link
Member

@wschin wschin Mar 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No one cares about entry points this and next weeks, right? I roughly remember filtering out hidden fields is a bit difficult, so want to make your life easier..


In reply to: 266176427 [](ancestors = 266176427,266176252)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes its difficult to filter out hidden fields, but its currently not needed in this PR. It would be good to fix this as NimbusML depends on manifest.json. This duplication will propagate confusion in set of params for FFM class in NimbusML as well as docs.


In reply to: 266176837 [](ancestors = 266176837,266176427,266176252)

@ganik ganik changed the title WIP Remove duplicate NormalizeFeatures from FFM trainer Remove duplicate NormalizeFeatures from FFM trainer Mar 18, 2019
@ganik ganik merged commit 0835c52 into dotnet:master Mar 18, 2019
@ghost ghost locked as resolved and limited conversation to collaborators Mar 23, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants