-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-23523] [SQL] [BACKPORT-2.3] Fix the incorrect result caused by the rule OptimizeMetadataOnlyQuery #20763
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…zeMetadataOnlyQuery
## What changes were proposed in this pull request?
```Scala
val tablePath = new File(s"${path.getCanonicalPath}/cOl3=c/cOl1=a/cOl5=e")
Seq(("a", "b", "c", "d", "e")).toDF("cOl1", "cOl2", "cOl3", "cOl4", "cOl5")
.write.json(tablePath.getCanonicalPath)
val df = spark.read.json(path.getCanonicalPath).select("CoL1", "CoL5", "CoL3").distinct()
df.show()
```
It generates a wrong result.
```
[c,e,a]
```
We have a bug in the rule `OptimizeMetadataOnlyQuery `. We should respect the attribute order in the original leaf node. This PR is to fix it.
## How was this patch tested?
Added a test case
Author: gatorsmile <[email protected]>
Closes apache#20684 from gatorsmile/optimizeMetadataOnly.
## What changes were proposed in this pull request? Inside `OptimizeMetadataOnlyQuery.getPartitionAttrs`, avoid using `zip` to generate attribute map. Also include other minor update of comments and format. ## How was this patch tested? Existing test cases. Author: Xingbo Jiang <[email protected]> Closes apache#20693 from jiangxb1987/SPARK-23523.
|
Test build #88061 has finished for PR 20763 at commit
|
|
retest this please |
|
cc @cloud-fan |
|
LGTM |
|
Test build #88136 has finished for PR 20763 at commit
|
|
retest this please |
|
Test build #88141 has finished for PR 20763 at commit
|
|
retest this please |
|
Test build #88144 has finished for PR 20763 at commit
|
|
retest this please |
|
test this please |
|
Test build #88162 has finished for PR 20763 at commit
|
|
retest this please |
|
Test build #88163 has finished for PR 20763 at commit
|
|
retest this please |
|
Test build #88182 has finished for PR 20763 at commit
|
|
retest this please |
|
Test build #88186 has finished for PR 20763 at commit
|
|
Thanks! Merged to 2.3 |
…he rule OptimizeMetadataOnlyQuery This PR is to backport #20684 and #20693 to Spark 2.3 branch --- ## What changes were proposed in this pull request? ```Scala val tablePath = new File(s"${path.getCanonicalPath}/cOl3=c/cOl1=a/cOl5=e") Seq(("a", "b", "c", "d", "e")).toDF("cOl1", "cOl2", "cOl3", "cOl4", "cOl5") .write.json(tablePath.getCanonicalPath) val df = spark.read.json(path.getCanonicalPath).select("CoL1", "CoL5", "CoL3").distinct() df.show() ``` It generates a wrong result. ``` [c,e,a] ``` We have a bug in the rule `OptimizeMetadataOnlyQuery `. We should respect the attribute order in the original leaf node. This PR is to fix it. ## How was this patch tested? Added a test case Author: Xingbo Jiang <[email protected]> Author: gatorsmile <[email protected]> Closes #20763 from gatorsmile/backport23523.
This PR is to backport #20684 and #20693 to Spark 2.3 branch
What changes were proposed in this pull request?
It generates a wrong result.
We have a bug in the rule
OptimizeMetadataOnlyQuery. We should respect the attribute order in the original leaf node. This PR is to fix it.How was this patch tested?
Added a test case