[SPARK-15668] [ML] ml.feature: update check schema to avoid confusion when user use MLlib.vector as input type #13411

hhbyyh · 2016-05-31T13:00:28Z

What changes were proposed in this pull request?

ml.feature: update check schema to avoid confusion when user use MLlib.vector as input type

How was this patch tested?

existing ut

SparkQA · 2016-05-31T13:40:17Z

Test build #59652 has finished for PR 13411 at commit ee33d2e.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

MLnick · 2016-06-01T01:21:18Z

mllib/src/main/scala/org/apache/spark/ml/feature/MinMaxScaler.scala

  /** Validates and transforms the input schema. */
  protected def validateAndTransformSchema(schema: StructType): StructType = {
    require($(min) < $(max), s"The specified min(${$(min)}) is larger or equal to max(${$(max)})")
    val inputType = schema($(inputCol)).dataType


inputType is not required any longer, right?

hhbyyh · 2016-06-01T03:06:07Z

Thanks for helping review. @MLnick

SparkQA · 2016-06-01T04:00:38Z

Test build #59711 has finished for PR 13411 at commit 81d87b9.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

MLnick · 2016-06-02T21:13:00Z

jenkins retest this please

SparkQA · 2016-06-02T22:15:50Z

Test build #59876 has finished for PR 13411 at commit 81d87b9.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

MLnick · 2016-06-02T22:40:11Z

mllib/src/main/scala/org/apache/spark/ml/feature/MaxAbsScaler.scala

-    val inputType = schema($(inputCol)).dataType
-    require(inputType.isInstanceOf[VectorUDT],
-      s"Input column ${$(inputCol)} must be a vector column")
+    SchemaUtils.checkColumnType(schema, $(inputCol), new VectorUDT)


just a note on this - the fact that it requires new VectorUDT results in a message that contains VectorUDT@XYZ i.e. an instance, which is ok but not ideal. For the built-in types we have a case object that makes it cleaner, so we could think about doing that for VectorUDT as it is used a lot, e.g.

private[spark] case object VectorUDT extends VectorUDT

Or alternatively, in SchemaUtils.checkColumnType we could use getClass.getName instead.

MLnick · 2016-06-02T23:36:14Z

I'm going to go ahead and merge this. We can make a call on #13411 (comment) and if anything needs doing we can do that in a follow up.

MLnick · 2016-06-02T23:37:45Z

Thanks, merged to master/branch-2.0

…when user use MLlib.vector as input type ## What changes were proposed in this pull request? ml.feature: update check schema to avoid confusion when user use MLlib.vector as input type ## How was this patch tested? existing ut Author: Yuhao Yang <[email protected]> Closes #13411 from hhbyyh/schemaCheck. (cherry picked from commit 5855e00) Signed-off-by: Nick Pentreath <[email protected]>

MLnick · 2016-06-03T01:29:52Z

Created https://issues.apache.org/jira/browse/SPARK-15746 to track the VectorUDT@XYZ error message printing.

update schema check

ee33d2e

Merge remote-tracking branch 'upstream/master' into schemaCheck

dda8c2b

MLnick reviewed Jun 1, 2016
View reviewed changes

remove inputtype

81d87b9

MLnick reviewed Jun 2, 2016
View reviewed changes

asfgit closed this in 5855e00 Jun 2, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-15668] [ML] ml.feature: update check schema to avoid confusion when user use MLlib.vector as input type #13411

[SPARK-15668] [ML] ml.feature: update check schema to avoid confusion when user use MLlib.vector as input type #13411

Uh oh!

hhbyyh commented May 31, 2016

Uh oh!

SparkQA commented May 31, 2016

Uh oh!

MLnick Jun 1, 2016

Uh oh!

hhbyyh commented Jun 1, 2016

Uh oh!

SparkQA commented Jun 1, 2016

Uh oh!

MLnick commented Jun 2, 2016

Uh oh!

SparkQA commented Jun 2, 2016

Uh oh!

MLnick Jun 2, 2016

Uh oh!

MLnick commented Jun 2, 2016

Uh oh!

MLnick commented Jun 2, 2016

Uh oh!

MLnick commented Jun 3, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[SPARK-15668] [ML] ml.feature: update check schema to avoid confusion when user use MLlib.vector as input type #13411

[SPARK-15668] [ML] ml.feature: update check schema to avoid confusion when user use MLlib.vector as input type #13411

Uh oh!

Conversation

hhbyyh commented May 31, 2016

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

SparkQA commented May 31, 2016

Uh oh!

MLnick Jun 1, 2016

Choose a reason for hiding this comment

Uh oh!

hhbyyh commented Jun 1, 2016

Uh oh!

SparkQA commented Jun 1, 2016

Uh oh!

MLnick commented Jun 2, 2016

Uh oh!

SparkQA commented Jun 2, 2016

Uh oh!

MLnick Jun 2, 2016

Choose a reason for hiding this comment

Uh oh!

MLnick commented Jun 2, 2016

Uh oh!

MLnick commented Jun 2, 2016

Uh oh!

MLnick commented Jun 3, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants