-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-23523] [SQL] [FOLLOWUP] Minor refactor of OptimizeMetadataOnlyQuery #20693
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
LGTM - pending jenkins |
|
Test build #87775 has finished for PR 20693 at commit
|
| data: Seq[InternalRow] = Nil, | ||
| // Indicates whether this relation has data from a streaming source. | ||
| override val isStreaming: Boolean = false) | ||
| case class LocalRelation(output: Seq[Attribute], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
although we should not include this style change in the original commit, since it's already there, let's not bother about reverting it back.
|
retest this please |
|
LGTM |
gatorsmile
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
Test build #87783 has finished for PR 20693 at commit
|
|
Test build #87787 has finished for PR 20693 at commit
|
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM.
## What changes were proposed in this pull request? Inside `OptimizeMetadataOnlyQuery.getPartitionAttrs`, avoid using `zip` to generate attribute map. Also include other minor update of comments and format. ## How was this patch tested? Existing test cases. Author: Xingbo Jiang <[email protected]> Closes apache#20693 from jiangxb1987/SPARK-23523.
…he rule OptimizeMetadataOnlyQuery This PR is to backport #20684 and #20693 to Spark 2.3 branch --- ## What changes were proposed in this pull request? ```Scala val tablePath = new File(s"${path.getCanonicalPath}/cOl3=c/cOl1=a/cOl5=e") Seq(("a", "b", "c", "d", "e")).toDF("cOl1", "cOl2", "cOl3", "cOl4", "cOl5") .write.json(tablePath.getCanonicalPath) val df = spark.read.json(path.getCanonicalPath).select("CoL1", "CoL5", "CoL3").distinct() df.show() ``` It generates a wrong result. ``` [c,e,a] ``` We have a bug in the rule `OptimizeMetadataOnlyQuery `. We should respect the attribute order in the original leaf node. This PR is to fix it. ## How was this patch tested? Added a test case Author: Xingbo Jiang <[email protected]> Author: gatorsmile <[email protected]> Closes #20763 from gatorsmile/backport23523.
## What changes were proposed in this pull request? Inside `OptimizeMetadataOnlyQuery.getPartitionAttrs`, avoid using `zip` to generate attribute map. Also include other minor update of comments and format. ## How was this patch tested? Existing test cases. Author: Xingbo Jiang <[email protected]> Closes apache#20693 from jiangxb1987/SPARK-23523.
What changes were proposed in this pull request?
Inside
OptimizeMetadataOnlyQuery.getPartitionAttrs, avoid usingzipto generate attribute map.Also include other minor update of comments and format.
How was this patch tested?
Existing test cases.