Skip to content

Conversation

@hhbyyh
Copy link
Contributor

@hhbyyh hhbyyh commented May 31, 2016

What changes were proposed in this pull request?

ml.feature: update check schema to avoid confusion when user use MLlib.vector as input type

How was this patch tested?

existing ut

@SparkQA
Copy link

SparkQA commented May 31, 2016

Test build #59652 has finished for PR 13411 at commit ee33d2e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

/** Validates and transforms the input schema. */
protected def validateAndTransformSchema(schema: StructType): StructType = {
require($(min) < $(max), s"The specified min(${$(min)}) is larger or equal to max(${$(max)})")
val inputType = schema($(inputCol)).dataType
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inputType is not required any longer, right?

@hhbyyh
Copy link
Contributor Author

hhbyyh commented Jun 1, 2016

Thanks for helping review. @MLnick

@SparkQA
Copy link

SparkQA commented Jun 1, 2016

Test build #59711 has finished for PR 13411 at commit 81d87b9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@MLnick
Copy link
Contributor

MLnick commented Jun 2, 2016

jenkins retest this please

@SparkQA
Copy link

SparkQA commented Jun 2, 2016

Test build #59876 has finished for PR 13411 at commit 81d87b9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

val inputType = schema($(inputCol)).dataType
require(inputType.isInstanceOf[VectorUDT],
s"Input column ${$(inputCol)} must be a vector column")
SchemaUtils.checkColumnType(schema, $(inputCol), new VectorUDT)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a note on this - the fact that it requires new VectorUDT results in a message that contains VectorUDT@XYZ i.e. an instance, which is ok but not ideal. For the built-in types we have a case object that makes it cleaner, so we could think about doing that for VectorUDT as it is used a lot, e.g.

private[spark] case object VectorUDT extends VectorUDT

Or alternatively, in SchemaUtils.checkColumnType we could use getClass.getName instead.

@MLnick
Copy link
Contributor

MLnick commented Jun 2, 2016

I'm going to go ahead and merge this. We can make a call on #13411 (comment) and if anything needs doing we can do that in a follow up.

@MLnick
Copy link
Contributor

MLnick commented Jun 2, 2016

Thanks, merged to master/branch-2.0

@asfgit asfgit closed this in 5855e00 Jun 2, 2016
asfgit pushed a commit that referenced this pull request Jun 2, 2016
…when user use MLlib.vector as input type

## What changes were proposed in this pull request?

ml.feature: update check schema to avoid confusion when user use MLlib.vector as input type

## How was this patch tested?
existing ut

Author: Yuhao Yang <[email protected]>

Closes #13411 from hhbyyh/schemaCheck.

(cherry picked from commit 5855e00)
Signed-off-by: Nick Pentreath <[email protected]>
@MLnick
Copy link
Contributor

MLnick commented Jun 3, 2016

Created https://issues.apache.org/jira/browse/SPARK-15746 to track the VectorUDT@XYZ error message printing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants