-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-16926] [SQL] Remove partition columns from partition metadata. #14515
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
This triggers the else case here: https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala#L368. cc: @andrewor14 |
|
|
||
| // Note: In Hive the schema and partition columns must be disjoint sets | ||
| val schema = catalogTable.schema.map(toHiveColumn).filter { c => | ||
| !catalogTable.partitionColumnNames.contains(c.getName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah good catch! It would be better if we can have a test to prove the unnecessary conversion object inspector is removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cloud-fan I've instead created a unit test that simply checks if the number of columns in the table and partition metadata are the same for a newly created table. Since this PR has been merged already, I created a new one: #14930.
|
ok to test |
|
Test build #64771 has finished for PR 14515 at commit
|
|
LGTM, merging this into 2.0 and master, thanks! |
## What changes were proposed in this pull request? This removes partition columns from column metadata of partitions to match tables. A change introduced in SPARK-14388 removed partition columns from the column metadata of tables, but not for partitions. This causes TableReader to believe that the schema is different between table and partition, and create an unnecessary conversion object inspector in TableReader. ## How was this patch tested? Existing unit tests. Author: Brian Cho <[email protected]> Closes #14515 from dafrista/partition-columns-metadata. (cherry picked from commit 473d786) Signed-off-by: Davies Liu <[email protected]>
…n metadata. ## What changes were proposed in this pull request? Add unit test for changes made in PR #14515. It makes sure that a newly created table has the same number of columns in table and partition metadata. This test fails before the changes introduced in #14515. ## How was this patch tested? Run new unit test. Author: Brian Cho <[email protected]> Closes #14930 from dafrista/partition-metadata-unit-test.
What changes were proposed in this pull request?
This removes partition columns from column metadata of partitions to match tables.
A change introduced in SPARK-14388 removed partition columns from the column metadata of tables, but not for partitions. This causes TableReader to believe that the schema is different between table and partition, and create an unnecessary conversion object inspector in TableReader.
How was this patch tested?
Existing unit tests.