[SPARK-23523] [SQL] [BACKPORT-2.3] Fix the incorrect result caused by the rule OptimizeMetadataOnlyQuery #20763

gatorsmile · 2018-03-07T23:48:40Z

This PR is to backport #20684 and #20693 to Spark 2.3 branch

What changes were proposed in this pull request?

val tablePath = new File(s"${path.getCanonicalPath}/cOl3=c/cOl1=a/cOl5=e")
 Seq(("a", "b", "c", "d", "e")).toDF("cOl1", "cOl2", "cOl3", "cOl4", "cOl5")
 .write.json(tablePath.getCanonicalPath)
 val df = spark.read.json(path.getCanonicalPath).select("CoL1", "CoL5", "CoL3").distinct()
 df.show()

It generates a wrong result.

[c,e,a]

We have a bug in the rule OptimizeMetadataOnlyQuery . We should respect the attribute order in the original leaf node. This PR is to fix it.

How was this patch tested?

Added a test case

…zeMetadataOnlyQuery ## What changes were proposed in this pull request? ```Scala val tablePath = new File(s"${path.getCanonicalPath}/cOl3=c/cOl1=a/cOl5=e") Seq(("a", "b", "c", "d", "e")).toDF("cOl1", "cOl2", "cOl3", "cOl4", "cOl5") .write.json(tablePath.getCanonicalPath) val df = spark.read.json(path.getCanonicalPath).select("CoL1", "CoL5", "CoL3").distinct() df.show() ``` It generates a wrong result. ``` [c,e,a] ``` We have a bug in the rule `OptimizeMetadataOnlyQuery `. We should respect the attribute order in the original leaf node. This PR is to fix it. ## How was this patch tested? Added a test case Author: gatorsmile <[email protected]> Closes apache#20684 from gatorsmile/optimizeMetadataOnly.

## What changes were proposed in this pull request? Inside `OptimizeMetadataOnlyQuery.getPartitionAttrs`, avoid using `zip` to generate attribute map. Also include other minor update of comments and format. ## How was this patch tested? Existing test cases. Author: Xingbo Jiang <[email protected]> Closes apache#20693 from jiangxb1987/SPARK-23523.

SparkQA · 2018-03-08T02:59:34Z

Test build #88061 has finished for PR 20763 at commit c0ac5ef.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2018-03-09T21:16:12Z

retest this please

gatorsmile · 2018-03-09T21:59:42Z

cc @cloud-fan

cloud-fan · 2018-03-09T22:33:21Z

LGTM

SparkQA · 2018-03-09T23:40:33Z

Test build #88136 has finished for PR 20763 at commit c0ac5ef.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2018-03-09T23:51:56Z

retest this please

SparkQA · 2018-03-10T01:23:06Z

Test build #88141 has finished for PR 20763 at commit c0ac5ef.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2018-03-10T05:45:31Z

retest this please

SparkQA · 2018-03-10T08:05:01Z

Test build #88144 has finished for PR 20763 at commit c0ac5ef.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2018-03-10T17:32:37Z

retest this please

gatorsmile · 2018-03-11T15:34:40Z

test this please

SparkQA · 2018-03-11T18:22:30Z

Test build #88162 has finished for PR 20763 at commit c0ac5ef.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2018-03-11T20:01:54Z

retest this please

SparkQA · 2018-03-11T21:36:38Z

Test build #88163 has finished for PR 20763 at commit c0ac5ef.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2018-03-12T19:28:41Z

retest this please

SparkQA · 2018-03-12T22:15:14Z

Test build #88182 has finished for PR 20763 at commit c0ac5ef.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2018-03-12T22:36:00Z

retest this please

SparkQA · 2018-03-13T01:46:10Z

Test build #88186 has finished for PR 20763 at commit c0ac5ef.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2018-03-13T04:47:30Z

Thanks! Merged to 2.3

…he rule OptimizeMetadataOnlyQuery This PR is to backport #20684 and #20693 to Spark 2.3 branch --- ## What changes were proposed in this pull request? ```Scala val tablePath = new File(s"${path.getCanonicalPath}/cOl3=c/cOl1=a/cOl5=e") Seq(("a", "b", "c", "d", "e")).toDF("cOl1", "cOl2", "cOl3", "cOl4", "cOl5") .write.json(tablePath.getCanonicalPath) val df = spark.read.json(path.getCanonicalPath).select("CoL1", "CoL5", "CoL3").distinct() df.show() ``` It generates a wrong result. ``` [c,e,a] ``` We have a bug in the rule `OptimizeMetadataOnlyQuery `. We should respect the attribute order in the original leaf node. This PR is to fix it. ## How was this patch tested? Added a test case Author: Xingbo Jiang <[email protected]> Author: gatorsmile <[email protected]> Closes #20763 from gatorsmile/backport23523.

gatorsmile and others added 2 commits March 7, 2018 15:45

mkressirer mentioned this pull request Mar 13, 2018

[SPARK-23523][SQL][BACKPORT-2.3] Fix the incorrect result caused by t… toasttab/spark#13

Merged

gatorsmile closed this Mar 19, 2018

[SPARK-23523] [SQL] [BACKPORT-2.3] Fix the incorrect result caused by the rule OptimizeMetadataOnlyQuery #20763

[SPARK-23523] [SQL] [BACKPORT-2.3] Fix the incorrect result caused by the rule OptimizeMetadataOnlyQuery #20763

Uh oh!

Conversation

gatorsmile commented Mar 7, 2018

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

SparkQA commented Mar 8, 2018

Uh oh!

gatorsmile commented Mar 9, 2018

Uh oh!

gatorsmile commented Mar 9, 2018

Uh oh!

cloud-fan commented Mar 9, 2018

Uh oh!

SparkQA commented Mar 9, 2018

Uh oh!

gatorsmile commented Mar 9, 2018

Uh oh!

SparkQA commented Mar 10, 2018

Uh oh!

cloud-fan commented Mar 10, 2018

Uh oh!

SparkQA commented Mar 10, 2018

Uh oh!

gatorsmile commented Mar 10, 2018

Uh oh!

gatorsmile commented Mar 11, 2018

Uh oh!

SparkQA commented Mar 11, 2018

Uh oh!

gatorsmile commented Mar 11, 2018

Uh oh!

SparkQA commented Mar 11, 2018

Uh oh!

cloud-fan commented Mar 12, 2018

Uh oh!

SparkQA commented Mar 12, 2018

Uh oh!

cloud-fan commented Mar 12, 2018

Uh oh!

SparkQA commented Mar 13, 2018

Uh oh!

gatorsmile commented Mar 13, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants