[SPARK-11453][SQL] append data to partitioned table will messes up the result #9408

cloud-fan · 2015-11-02T11:09:21Z

The reason is that:

For partitioned hive table, we will move the partitioned columns after data columns. (e.g. <a: Int, b: Int> partition by a will become <b: Int, a: Int>)
When append data to table, we use position to figure out how to match input columns to table's columns.

So when we append data to partitioned table, we will match wrong columns between input and table. A solution is reordering the input columns before match by position, like what we did for InsertIntoHadoopFsRelation

cloud-fan · 2015-11-02T11:10:54Z

cc @yhuai

SparkQA · 2015-11-02T13:15:56Z

Test build #44804 has finished for PR 9408 at commit b1512b0.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):\n * case class Corr(\n * case class Corr(left: Expression, right: Expression)\n * case class RepartitionByExpression(\n * logInfo(s\"Hive class not found $e\")\n * logDebug(\"Hive class not found\", e)\n

cloud-fan · 2015-11-03T02:45:02Z

cc @liancheng

liancheng · 2015-11-04T08:31:20Z

sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala

Do we need to consider case sensitivity for partition column names here?

liancheng · 2015-11-04T08:33:13Z

Overall looks good, left some minor comments.

SparkQA · 2015-11-04T14:59:04Z

Test build #45015 has finished for PR 9408 at commit 06b96ed.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

yhuai · 2015-11-04T21:03:56Z

sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala

We only need to print column names, right?

SparkQA · 2015-11-05T04:17:20Z

Test build #45084 has finished for PR 9408 at commit e682c86.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

yhuai · 2015-11-05T23:13:47Z

sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala

actually, can we have a check to make sure that partition columns do appear at the end of the column list?

hmmm, I think normalizedParCols already did this check?

cloud-fan · 2015-11-08T13:51:44Z

retest this please

SparkQA · 2015-11-08T16:31:20Z

Test build #2013 has finished for PR 9408 at commit 186d281.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-11-08T20:54:06Z

Test build #2015 has finished for PR 9408 at commit 186d281.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-11-08T21:30:35Z

Test build #45305 has finished for PR 9408 at commit 4a61037.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

yhuai · 2015-11-09T05:01:11Z

LGTM. Merging to master and branch 1.6.

…e result The reason is that: 1. For partitioned hive table, we will move the partitioned columns after data columns. (e.g. `<a: Int, b: Int>` partition by `a` will become `<b: Int, a: Int>`) 2. When append data to table, we use position to figure out how to match input columns to table's columns. So when we append data to partitioned table, we will match wrong columns between input and table. A solution is reordering the input columns before match by position, like what we did for [`InsertIntoHadoopFsRelation`](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InsertIntoHadoopFsRelation.scala#L101-L105) Author: Wenchen Fan <[email protected]> Closes #9408 from cloud-fan/append. (cherry picked from commit d8b50f7) Signed-off-by: Yin Huai <[email protected]>

fix bug of appending data to partitioned table

b1512b0

liancheng reviewed Nov 4, 2015
View reviewed changes

cloud-fan added 2 commits November 4, 2015 17:14

Merge remote-tracking branch 'origin/master' into append

1e19944

address comments

06b96ed

yhuai reviewed Nov 4, 2015
View reviewed changes

sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala Outdated

Copy link

Contributor

yhuai Nov 4, 2015

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only need to print column names, right?

address comments

e682c86

cloud-fan force-pushed the append branch from 9e543d4 to e682c86 Compare November 5, 2015 02:07

yhuai reviewed Nov 5, 2015
View reviewed changes

add test

186d281

cloud-fan added 2 commits November 9, 2015 00:46

Merge remote-tracking branch 'origin/master' into append

a3046b4

rebase

4a61037

asfgit closed this in d8b50f7 Nov 9, 2015

[SPARK-11453][SQL] append data to partitioned table will messes up the result #9408

[SPARK-11453][SQL] append data to partitioned table will messes up the result #9408

Uh oh!

Conversation

cloud-fan commented Nov 2, 2015

Uh oh!

cloud-fan commented Nov 2, 2015

Uh oh!

SparkQA commented Nov 2, 2015

Uh oh!

cloud-fan commented Nov 3, 2015

Uh oh!

liancheng Nov 4, 2015

Choose a reason for hiding this comment

Uh oh!

liancheng commented Nov 4, 2015

Uh oh!

SparkQA commented Nov 4, 2015

Uh oh!

yhuai Nov 4, 2015

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Nov 5, 2015

Uh oh!

yhuai Nov 5, 2015

Choose a reason for hiding this comment

Uh oh!

cloud-fan Nov 6, 2015

Choose a reason for hiding this comment

Uh oh!

cloud-fan commented Nov 8, 2015

Uh oh!

SparkQA commented Nov 8, 2015

Uh oh!

SparkQA commented Nov 8, 2015

Uh oh!

SparkQA commented Nov 8, 2015

Uh oh!

yhuai commented Nov 9, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants