-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-11453][SQL] append data to partitioned table will messes up the result #9408
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
cc @yhuai |
|
Test build #44804 has finished for PR 9408 at commit
|
|
cc @liancheng |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to consider case sensitivity for partition column names here?
|
Overall looks good, left some minor comments. |
|
Test build #45015 has finished for PR 9408 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We only need to print column names, right?
|
Test build #45084 has finished for PR 9408 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, can we have a check to make sure that partition columns do appear at the end of the column list?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmm, I think normalizedParCols already did this check?
|
retest this please |
|
Test build #2013 has finished for PR 9408 at commit
|
|
Test build #2015 has finished for PR 9408 at commit
|
|
Test build #45305 has finished for PR 9408 at commit
|
|
LGTM. Merging to master and branch 1.6. |
…e result The reason is that: 1. For partitioned hive table, we will move the partitioned columns after data columns. (e.g. `<a: Int, b: Int>` partition by `a` will become `<b: Int, a: Int>`) 2. When append data to table, we use position to figure out how to match input columns to table's columns. So when we append data to partitioned table, we will match wrong columns between input and table. A solution is reordering the input columns before match by position, like what we did for [`InsertIntoHadoopFsRelation`](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InsertIntoHadoopFsRelation.scala#L101-L105) Author: Wenchen Fan <[email protected]> Closes #9408 from cloud-fan/append. (cherry picked from commit d8b50f7) Signed-off-by: Yin Huai <[email protected]>
The reason is that:
<a: Int, b: Int>partition byawill become<b: Int, a: Int>)So when we append data to partitioned table, we will match wrong columns between input and table. A solution is reordering the input columns before match by position, like what we did for
InsertIntoHadoopFsRelation