Skip to content

Conversation

@gatorsmile
Copy link
Member

This PR is to backport #20684 and #20693 to Spark 2.3 branch


What changes were proposed in this pull request?

val tablePath = new File(s"${path.getCanonicalPath}/cOl3=c/cOl1=a/cOl5=e")
 Seq(("a", "b", "c", "d", "e")).toDF("cOl1", "cOl2", "cOl3", "cOl4", "cOl5")
 .write.json(tablePath.getCanonicalPath)
 val df = spark.read.json(path.getCanonicalPath).select("CoL1", "CoL5", "CoL3").distinct()
 df.show()

It generates a wrong result.

[c,e,a]

We have a bug in the rule OptimizeMetadataOnlyQuery . We should respect the attribute order in the original leaf node. This PR is to fix it.

How was this patch tested?

Added a test case

gatorsmile and others added 2 commits March 7, 2018 15:45
…zeMetadataOnlyQuery

## What changes were proposed in this pull request?
```Scala
val tablePath = new File(s"${path.getCanonicalPath}/cOl3=c/cOl1=a/cOl5=e")
 Seq(("a", "b", "c", "d", "e")).toDF("cOl1", "cOl2", "cOl3", "cOl4", "cOl5")
 .write.json(tablePath.getCanonicalPath)
 val df = spark.read.json(path.getCanonicalPath).select("CoL1", "CoL5", "CoL3").distinct()
 df.show()
```

It generates a wrong result.
```
[c,e,a]
```

We have a bug in the rule `OptimizeMetadataOnlyQuery `. We should respect the attribute order in the original leaf node. This PR is to fix it.

## How was this patch tested?
Added a test case

Author: gatorsmile <[email protected]>

Closes apache#20684 from gatorsmile/optimizeMetadataOnly.
## What changes were proposed in this pull request?

Inside `OptimizeMetadataOnlyQuery.getPartitionAttrs`, avoid using `zip` to generate attribute map.
Also include other minor update of comments and format.

## How was this patch tested?

Existing test cases.

Author: Xingbo Jiang <[email protected]>

Closes apache#20693 from jiangxb1987/SPARK-23523.
@SparkQA
Copy link

SparkQA commented Mar 8, 2018

Test build #88061 has finished for PR 20763 at commit c0ac5ef.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member Author

retest this please

@gatorsmile
Copy link
Member Author

cc @cloud-fan

@cloud-fan
Copy link
Contributor

LGTM

@SparkQA
Copy link

SparkQA commented Mar 9, 2018

Test build #88136 has finished for PR 20763 at commit c0ac5ef.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member Author

retest this please

@SparkQA
Copy link

SparkQA commented Mar 10, 2018

Test build #88141 has finished for PR 20763 at commit c0ac5ef.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Mar 10, 2018

Test build #88144 has finished for PR 20763 at commit c0ac5ef.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member Author

retest this please

@gatorsmile
Copy link
Member Author

test this please

@SparkQA
Copy link

SparkQA commented Mar 11, 2018

Test build #88162 has finished for PR 20763 at commit c0ac5ef.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member Author

retest this please

@SparkQA
Copy link

SparkQA commented Mar 11, 2018

Test build #88163 has finished for PR 20763 at commit c0ac5ef.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Mar 12, 2018

Test build #88182 has finished for PR 20763 at commit c0ac5ef.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Mar 13, 2018

Test build #88186 has finished for PR 20763 at commit c0ac5ef.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member Author

Thanks! Merged to 2.3

asfgit pushed a commit that referenced this pull request Mar 13, 2018
…he rule OptimizeMetadataOnlyQuery

This PR is to backport #20684 and #20693 to Spark 2.3 branch

---

## What changes were proposed in this pull request?
```Scala
val tablePath = new File(s"${path.getCanonicalPath}/cOl3=c/cOl1=a/cOl5=e")
 Seq(("a", "b", "c", "d", "e")).toDF("cOl1", "cOl2", "cOl3", "cOl4", "cOl5")
 .write.json(tablePath.getCanonicalPath)
 val df = spark.read.json(path.getCanonicalPath).select("CoL1", "CoL5", "CoL3").distinct()
 df.show()
```

It generates a wrong result.
```
[c,e,a]
```

We have a bug in the rule `OptimizeMetadataOnlyQuery `. We should respect the attribute order in the original leaf node. This PR is to fix it.

## How was this patch tested?
Added a test case

Author: Xingbo Jiang <[email protected]>
Author: gatorsmile <[email protected]>

Closes #20763 from gatorsmile/backport23523.
@gatorsmile gatorsmile closed this Mar 19, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants