[SPARK-16926] [SQL] Remove partition columns from partition metadata. #14515

bchocho · 2016-08-05T22:31:27Z

What changes were proposed in this pull request?

This removes partition columns from column metadata of partitions to match tables.

A change introduced in SPARK-14388 removed partition columns from the column metadata of tables, but not for partitions. This causes TableReader to believe that the schema is different between table and partition, and create an unnecessary conversion object inspector in TableReader.

How was this patch tested?

Existing unit tests.

bchocho · 2016-08-08T17:09:06Z

This triggers the else case here: https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala#L368.

cc: @andrewor14
Can you please take a look? The columns for tables were removed in SPARK-14388.

cloud-fan · 2016-09-01T08:52:09Z

sql/hive/src/main/scala/org/apache/spark/sql/hive/MetastoreRelation.scala

+
+      // Note: In Hive the schema and partition columns must be disjoint sets
+      val schema = catalogTable.schema.map(toHiveColumn).filter { c =>
+        !catalogTable.partitionColumnNames.contains(c.getName)


ah good catch! It would be better if we can have a test to prove the unnecessary conversion object inspector is removed.

@cloud-fan I've instead created a unit test that simply checks if the number of columns in the table and partition metadata are the same for a newly created table. Since this PR has been merged already, I created a new one: #14930.

cloud-fan · 2016-09-01T08:52:17Z

ok to test

SparkQA · 2016-09-01T10:31:52Z

Test build #64771 has finished for PR 14515 at commit fd37123.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

davies · 2016-09-01T21:12:54Z

LGTM, merging this into 2.0 and master, thanks!

## What changes were proposed in this pull request? This removes partition columns from column metadata of partitions to match tables. A change introduced in SPARK-14388 removed partition columns from the column metadata of tables, but not for partitions. This causes TableReader to believe that the schema is different between table and partition, and create an unnecessary conversion object inspector in TableReader. ## How was this patch tested? Existing unit tests. Author: Brian Cho <[email protected]> Closes #14515 from dafrista/partition-columns-metadata. (cherry picked from commit 473d786) Signed-off-by: Davies Liu <[email protected]>

…n metadata. ## What changes were proposed in this pull request? Add unit test for changes made in PR #14515. It makes sure that a newly created table has the same number of columns in table and partition metadata. This test fails before the changes introduced in #14515. ## How was this patch tested? Run new unit test. Author: Brian Cho <[email protected]> Closes #14930 from dafrista/partition-metadata-unit-test.

Remove partition columns from partition metadata.

fd37123

cloud-fan reviewed Sep 1, 2016
View reviewed changes

asfgit closed this in 473d786 Sep 1, 2016

bchocho mentioned this pull request Sep 2, 2016

[SPARK-16926] [SQL] Add unit test to compare table and partition column metadata. #14930

Closed

gatorsmile mentioned this pull request Feb 20, 2017

[SPARK-19390][SQL] Replace the unnecessary usages of hiveQlTable #16726

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-16926] [SQL] Remove partition columns from partition metadata. #14515

[SPARK-16926] [SQL] Remove partition columns from partition metadata. #14515

Uh oh!

bchocho commented Aug 5, 2016 •

edited

Loading

Uh oh!

bchocho commented Aug 8, 2016

Uh oh!

cloud-fan Sep 1, 2016

Uh oh!

bchocho Sep 2, 2016

Uh oh!

cloud-fan commented Sep 1, 2016

Uh oh!

SparkQA commented Sep 1, 2016

Uh oh!

davies commented Sep 1, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[SPARK-16926] [SQL] Remove partition columns from partition metadata. #14515

[SPARK-16926] [SQL] Remove partition columns from partition metadata. #14515

Uh oh!

Conversation

bchocho commented Aug 5, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

bchocho commented Aug 8, 2016

Uh oh!

cloud-fan Sep 1, 2016

Choose a reason for hiding this comment

Uh oh!

bchocho Sep 2, 2016

Choose a reason for hiding this comment

Uh oh!

cloud-fan commented Sep 1, 2016

Uh oh!

SparkQA commented Sep 1, 2016

Uh oh!

davies commented Sep 1, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

bchocho commented Aug 5, 2016 •

edited

Loading