Skip to content

Conversation

@bchocho
Copy link
Contributor

@bchocho bchocho commented Aug 5, 2016

What changes were proposed in this pull request?

This removes partition columns from column metadata of partitions to match tables.

A change introduced in SPARK-14388 removed partition columns from the column metadata of tables, but not for partitions. This causes TableReader to believe that the schema is different between table and partition, and create an unnecessary conversion object inspector in TableReader.

How was this patch tested?

Existing unit tests.

@bchocho
Copy link
Contributor Author

bchocho commented Aug 8, 2016

This triggers the else case here: https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala#L368.

cc: @andrewor14
Can you please take a look? The columns for tables were removed in SPARK-14388.


// Note: In Hive the schema and partition columns must be disjoint sets
val schema = catalogTable.schema.map(toHiveColumn).filter { c =>
!catalogTable.partitionColumnNames.contains(c.getName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah good catch! It would be better if we can have a test to prove the unnecessary conversion object inspector is removed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan I've instead created a unit test that simply checks if the number of columns in the table and partition metadata are the same for a newly created table. Since this PR has been merged already, I created a new one: #14930.

@cloud-fan
Copy link
Contributor

ok to test

@SparkQA
Copy link

SparkQA commented Sep 1, 2016

Test build #64771 has finished for PR 14515 at commit fd37123.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@davies
Copy link
Contributor

davies commented Sep 1, 2016

LGTM, merging this into 2.0 and master, thanks!

@asfgit asfgit closed this in 473d786 Sep 1, 2016
asfgit pushed a commit that referenced this pull request Sep 1, 2016
## What changes were proposed in this pull request?

This removes partition columns from column metadata of partitions to match tables.

A change introduced in SPARK-14388 removed partition columns from the column metadata of tables, but not for partitions. This causes TableReader to believe that the schema is different between table and partition, and create an unnecessary conversion object inspector in TableReader.

## How was this patch tested?

Existing unit tests.

Author: Brian Cho <[email protected]>

Closes #14515 from dafrista/partition-columns-metadata.

(cherry picked from commit 473d786)
Signed-off-by: Davies Liu <[email protected]>
asfgit pushed a commit that referenced this pull request Sep 2, 2016
…n metadata.

## What changes were proposed in this pull request?

Add unit test for changes made in PR #14515. It makes sure that a newly created table has the same number of columns in table and partition metadata. This test fails before the changes introduced in #14515.

## How was this patch tested?

Run new unit test.

Author: Brian Cho <[email protected]>

Closes #14930 from dafrista/partition-metadata-unit-test.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants