-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Initial featurizers project #4413
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial featurizers project #4413
Conversation
pkg/Microsoft.ML.Featurizers/Microsoft.ML.Featurizers.nupkgproj
Outdated
Show resolved
Hide resolved
pkg/Microsoft.ML.Featurizers/Microsoft.ML.Featurizers.nupkgproj
Outdated
Show resolved
Hide resolved
| // Not all these types are currently supported. This is so the ordering will allign with the native code. | ||
| internal enum TypeId : uint | ||
| { | ||
| String = 1, SByte, Short, Int, Long, Byte, UShort, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typical C# code has these be on separate lines. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we have a int next to each? This will assist with long term compatibility by allowing for inserting new values, and in case someone decides to sort the enum differently later (more important if serialized in a model). #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, you are correct. I have added an int next to each one now. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks #Resolved
pkg/Microsoft.ML.Featurizers/Microsoft.ML.Featurizers.nupkgproj
Outdated
Show resolved
Hide resolved
| var meth = MarshalActionInvokeCheckAndCreate(func, genArgs); | ||
| if (meth.ReturnType != typeof(TRet)) | ||
| throw Contracts.ExceptParam(nameof(func), "Cannot be generic on return type"); | ||
| return meth; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think method is a reserved keyword in C#. Recommend the longer name.
| return meth; | |
| return method; | |
| ``` #Resolved |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kept it as meth because that is consistent with all the other methods here that do this same thing. I think if we want it to be method then we need to change them all.
In reply to: 347619466 [](ancestors = 347619466)
|
|
||
| <PropertyGroup> | ||
| <TargetFramework>netstandard2.0</TargetFramework> | ||
| <PackageDescription>Additional ML.NET featurizers</PackageDescription> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additional ML.NET featurizers [](start = 24, length = 29)
ML .NET featurizers with native code implementation. #Resolved
| return meth; | ||
| } | ||
|
|
||
| private static MethodInfo MarshalInvokeCheckAndCreate<TRet>(Delegate func, Type[] genArgs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Delegate func, Type[] genArgs) [](start = 68, length = 30)
lets keep the order consistent types then delegate #Resolved
| return meth; | ||
| } | ||
|
|
||
| private static MethodInfo MarshalActionInvokeCheckAndCreate(Delegate func, params Type[] typeArguments) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Delegate func, params Type[] typeArguments) [](start = 67, length = 44)
same here, lets keep the order consistent, types then delegate #Resolved
|
|
||
| internal enum FitResult : byte | ||
| { | ||
| Complete = 1, Continue, ResetAndContinue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Complete = 1, Continue, ResetAndContinue [](start = 8, length = 40)
put each on its own line and explicitly define values. #Resolved
| Tabular = 0x1001 | LastStaticValue + 3, | ||
| Nullable = 0x1001 | LastStaticValue + 4, | ||
| Vector = 0x1001 | LastStaticValue + 5, | ||
| MapId = 0x1002 | LastStaticValue + 6 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MapId = 0x1002 | LastStaticValue + 6 [](start = 8, length = 36)
please add comments for this #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this add LastStaticValue? In the future, this will shift all of the uint values by 1, as a new type is inserted to the top section.
Why the math? Seems more simple to directly map to the values.
Why is the base of the last one 0x1002? That skips a value. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of these are taken directly as is from the Native code base. They are here just so we have a correct mapping on our side.
LastStaticValue is supposed to represent the last stand alone type. All the types that follow after it are compound types. There is probably a more appropriate name, but since I want to keep this identical to the Native code I left it as is.
The math is there just to keep it identical to the native code. I could directly map the values, but then I would have to map the values every time I need to check if it has changed in the native code. Leaving it like this makes comparisons much easier.
The number at the end of 0x100N represents how many types come after it that are associated to it. So Tabular, Nullable, and Vector, end in 1 meaning only 1 type follows it. The Map ends in 2 because we need types for the Keys and for the Values.
In reply to: 348115118 [](ancestors = 348115118)
| return Encoding.UTF8.GetString(buffer); | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
} [](start = 8, length = 1)
new line #Resolved
| return TypeId.Int; | ||
| else if (type == typeof(long)) | ||
| return TypeId.Long; | ||
| else if (type == typeof(byte)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(byte)) [](start = 35, length = 7)
sbyte #Resolved
| else if (type == typeof(long)) | ||
| return TypeId.Long; | ||
| else if (type == typeof(byte)) | ||
| return TypeId.Byte; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Byte [](start = 30, length = 4)
SByte #Resolved
codemzs
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
![]()
| : base(provider, input) | ||
| { | ||
| var pred = parent.GetActiveOutputColumns(active); | ||
| _getters = parent._mapper.CreateGetters(input, pred, out _disposer); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_getters = parent._mapper.CreateGetters(input, pred, out _disposer); [](start = 16, length = 68)
lets revert this and use a thread safe dictionary for now. I want to think a little more on this. #Resolved
Codecov Report
@@ Coverage Diff @@
## master #4413 +/- ##
=========================================
Coverage ? 74.86%
=========================================
Files ? 908
Lines ? 159862
Branches ? 17214
=========================================
Hits ? 119688
Misses ? 35354
Partials ? 4820
|
|
@michaelgsharp Can you please sync to master? |
ToStringTransformer is done. CatagoryImputer is done. TimeSeriesImputer is done. RobustScaler is done. Adding in samples and documentation. General code cleanup. Made the RowToRowMapperTransform create a new mapper if possible for each cursor.
dce7223 to
7fc97ae
Compare
Added line in Directory.Build.Props that was removed during the rebase process. The line already exists in master and shouldn't be removed.
This is the initial featurizers PR to split apart PR #4157 and get all the common code into master.
Once this code goes in, I will create separate PR's for each of the 5 featurizers.
This is the common code for the featurizers. Its mostly changes to project files and solution files. Common.cs is for all the shared code for the featurizers. The change in Utils.cs is to allow
Marshal.Invokewith multiple type parameters. The RowToRowMapperTransform.cs change has it create a new mapper when possible. This helps with thread safety/local caching of the mappers when run in a multi threaded approach.