Skip to content

Conversation

@seancxmao
Copy link
Contributor

What changes were proposed in this pull request?

This is a backport of #22148

Spark SQL returns NULL for a column whose Hive metastore schema and Parquet schema are in different letter cases, regardless of spark.sql.caseSensitive set to true or false. This PR aims to add case-insensitive field resolution for ParquetFileFormat.

  • Do case-insensitive resolution only if Spark is in case-insensitive mode.
  • Field resolution should fail if there is ambiguity, i.e. more than one field is matched.

How was this patch tested?

Unit tests added.

@gatorsmile
Copy link
Member

@cloud-fan
Copy link
Contributor

ok to test

@cloud-fan
Copy link
Contributor

+1 to backport this. I think it's a bug that we don't respect case-sensitive config when resolving parquet fields.

@yucai
Copy link
Contributor

yucai commented Aug 24, 2018

We need to backport it. Without this PR, we cannot solve the data issue in [SPARK-25206] Wrong data may be returned when enable pushdown.

@SparkQA
Copy link

SparkQA commented Aug 24, 2018

Test build #95215 has finished for PR 22183 at commit 2831588.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

For Hive tables, column resolution is always case insensitive. However, When spark.sql.hive.convertMetastoreParquet is true, users might face inconsistent behaviors when they use native parquet reader to resolve the columns in the case sensitive mode. We still introduce behavior changes. Better error messages sounds good enough, instead of disabling spark.sql.hive.convertMetastoreParquet when the mode is case sensitive.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@HyukjinKwon
Copy link
Member

BTW, @gatorsmile and @cloud-fan, do you know who did this ^ and why?

@cloud-fan
Copy link
Contributor

As discussed in the JIRA, this is a partial fix, and we need to backport another 2 PRs, which is risky. Can we close it?

@HyukjinKwon
Copy link
Member

ohh.. no no .. I meant:

Can one of the admins verify this patch?

@cloud-fan
Copy link
Contributor

ah, I think there is a service that will comment on stale PRs to ask people to review. I don't who maintain this service though...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants