Skip to content

Conversation

@seancxmao
Copy link
Contributor

What changes were proposed in this pull request?

Spark SQL returns NULL for a column whose Hive metastore schema and Parquet schema are in different letter cases, regardless of spark.sql.caseSensitive set to true or false. This applies not only to Parquet, but also to ORC. Following is a brief summary:

  • ParquetFileFormat doesn't support case-insensitive field resolution.
  • native OrcFileFormat supports case-insensitive field resolution, however it cannot handle duplicate fields.
  • hive OrcFileFormat doesn't support case-insensitive field resolution.

#15799 reverted case-insensitive resolution for ParquetFileFormat and hive OrcFileFormat. This PR brings it back and improves it to do case-insensitive resolution only if Spark is in case-insensitive mode. And field resolution will fail if there is ambiguity, i.e. more than one field is matched. ParquetFileFormat, native OrcFileFormat and hive OrcFileFormat are all supported.

How was this patch tested?

Unit tests added.

…uet/ORC

* Fix ParquetFileFormat
* More than one Parquet column is matched
* Fix OrcFileFormat (both native and hive implementations)
* Fix issues according to review results: refactor test cases, code style, ...
* Test cases: change paruqet/orc file schema from a to A
* Test cases: let different columns have different value series
* Refine error message
* Split multi-format test suite
* Simplify test cases for ambiguous resolution
* Simplify test cases to reduce code lines
* Refine tests and  comments
@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@seancxmao seancxmao changed the title [SPARK-25132][SQL] case-insensitive field resolution when reading from Parquet/ORC [SPARK-25132][SQL] Case-insensitive field resolution when reading from Parquet/ORC Aug 20, 2018
@seancxmao
Copy link
Contributor Author

Split this into 2 PRs, one for Parquet and ORC respectively.

@seancxmao seancxmao closed this Aug 20, 2018
@seancxmao seancxmao deleted the SPARK-25132 branch August 22, 2018 09:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants