-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-15668] [ML] ml.feature: update check schema to avoid confusion when user use MLlib.vector as input type #13411
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #59652 has finished for PR 13411 at commit
|
| /** Validates and transforms the input schema. */ | ||
| protected def validateAndTransformSchema(schema: StructType): StructType = { | ||
| require($(min) < $(max), s"The specified min(${$(min)}) is larger or equal to max(${$(max)})") | ||
| val inputType = schema($(inputCol)).dataType |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
inputType is not required any longer, right?
|
Thanks for helping review. @MLnick |
|
Test build #59711 has finished for PR 13411 at commit
|
|
jenkins retest this please |
|
Test build #59876 has finished for PR 13411 at commit
|
| val inputType = schema($(inputCol)).dataType | ||
| require(inputType.isInstanceOf[VectorUDT], | ||
| s"Input column ${$(inputCol)} must be a vector column") | ||
| SchemaUtils.checkColumnType(schema, $(inputCol), new VectorUDT) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a note on this - the fact that it requires new VectorUDT results in a message that contains VectorUDT@XYZ i.e. an instance, which is ok but not ideal. For the built-in types we have a case object that makes it cleaner, so we could think about doing that for VectorUDT as it is used a lot, e.g.
private[spark] case object VectorUDT extends VectorUDT
Or alternatively, in SchemaUtils.checkColumnType we could use getClass.getName instead.
|
I'm going to go ahead and merge this. We can make a call on #13411 (comment) and if anything needs doing we can do that in a follow up. |
|
Thanks, merged to master/branch-2.0 |
…when user use MLlib.vector as input type ## What changes were proposed in this pull request? ml.feature: update check schema to avoid confusion when user use MLlib.vector as input type ## How was this patch tested? existing ut Author: Yuhao Yang <[email protected]> Closes #13411 from hhbyyh/schemaCheck. (cherry picked from commit 5855e00) Signed-off-by: Nick Pentreath <[email protected]>
|
Created https://issues.apache.org/jira/browse/SPARK-15746 to track the |
What changes were proposed in this pull request?
ml.feature: update check schema to avoid confusion when user use MLlib.vector as input type
How was this patch tested?
existing ut