Skip to content

Conversation

@Ngone51
Copy link
Member

@Ngone51 Ngone51 commented Dec 6, 2019

What changes were proposed in this pull request?

Issue better error message when user-specified schema and not match relation schema

Why are the changes needed?

Inspired by #25248 (comment), user could get a weird error message when type mapping behavior change between Spark schema and datasource schema(e.g. JDBC). Instead of saying "SomeProvider does not allow user-specified schemas.", we'd better tell user what is really happening here to make user be more clearly about the error.

Does this PR introduce any user-facing change?

Yes, user will see error message changes.

How was this patch tested?

Updated existed tests.

@Ngone51
Copy link
Member Author

Ngone51 commented Dec 6, 2019

cc @cloud-fan @gatorsmile

@SparkQA
Copy link

SparkQA commented Dec 6, 2019

Test build #114950 has finished for PR 26781 at commit dd45804.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@Ngone51
Copy link
Member Author

Ngone51 commented Dec 6, 2019

The failed test ThriftServerWithSparkContextSuite.SPARK-29911: Uncache cached tables when session closed looks unrelated and it passed locally.

@Ngone51
Copy link
Member Author

Ngone51 commented Dec 6, 2019

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Dec 6, 2019

Test build #114955 has finished for PR 26781 at commit dd45804.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

.unzip
if (persistentFields.nonEmpty) {
val errorMsg =
s"Mismatched fields detected between persistent schema and user specified schema: " +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: seems like we can remove ss.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean: fields -> filed ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, I meant the s for string interpolation (s"...)

val persistentSize = persistentSchema.size
val specifiedSize = specifiedSchema.size
if (persistentSize == specifiedSize) {
val (persistentFields, specifiedFields) = persistentSchema.zip(specifiedSchema)
Copy link
Member

@HyukjinKwon HyukjinKwon Dec 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're going to improve such error message case across the codebase, we might also think about having a common method (maybe something called assertEquality in StructType?) that checks each type recursively and shows a better message. Can we at least have a private method here for this case in the future?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure whether we'd require this similar functionality in some cases in the future. But, maybe, we could still give it a try.

Copy link
Member

@HyukjinKwon HyukjinKwon Dec 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think it wont handle nested cases. There are other external data sources that support nested schema and the current code tells only root columns.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, there are many cases to show better error messages like this. E.g., StructType.merge or _merge_type in Python's schema inference (https://github.com/apache/spark/blob/master/python/pyspark/sql/types.py#L1097-L1111)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see #19792 or #18521 as an example.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @HyukjinKwon , after discussing with wenchen offline, we decide not to make it too complicated here. If schemas are detected not match, we simply show the whole schema to user rather than those mismatched fields as previously did. Please see de036b6.

// only implements the RelationProvider or the SchemaRelationProvider.
Seq("TEMPORARY VIEW", "TABLE").foreach { tableType =>
val schemaNotAllowed = intercept[Exception] {
val schemaNotMatch = intercept[Exception] {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the variable name to make it be more readable according to current error message.

@SparkQA
Copy link

SparkQA commented Dec 9, 2019

Test build #115037 has finished for PR 26781 at commit f54dea9.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class JdbcUtilsSuite extends SparkFunSuite

@SparkQA
Copy link

SparkQA commented Dec 9, 2019

Test build #115040 has finished for PR 26781 at commit c2b5eea.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

"you're using DataFrameReader.schema API or creating a table, please do not " +
"specify the schema. Or if you're scanning an existed table, please drop " +
"it and re-create it."
throw new AnalysisException(errorMsg)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: format like this?

          throw new AnalysisException("The user-specified schema doesn't match the actual schema: " +
            s"user-specified: ${schema.toDDL}, actual: ${baseRelation.schema.toDDL}. If " +
            "you're using DataFrameReader.schema API or creating a table, please do not " +
            "specify the schema. Or if you're scanning an existed table, please drop " +
            "it and re-create it.")

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated, thanks!

@SparkQA
Copy link

SparkQA commented Dec 10, 2019

Test build #115083 has finished for PR 26781 at commit 814821a.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@Ngone51
Copy link
Member Author

Ngone51 commented Dec 10, 2019

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Dec 10, 2019

Test build #115093 has finished for PR 26781 at commit 814821a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in aa9da93 Dec 10, 2019
@Ngone51
Copy link
Member Author

Ngone51 commented Dec 10, 2019

thanks @cloud-fan @HyukjinKwon @maropu

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants