-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-12989] [SQL] Delaying Alias Cleanup after ExtractWindowExpressions #10963
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #50246 has finished for PR 10963 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious, why do we have to call toAttribute at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The whole part is confusing. Let me rewrite this function to make it clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, wait a second. I would like to fix this in branch-1.6 so a minimal fix would be preferred (and we can refactor master later if there is a good reason).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, let me change it to ne without calling toAttribute and move the test case to DataFrameWindowSuite. Thanks!
|
Test build #50318 has finished for PR 10963 at commit
|
# Conflicts: # sql/core/src/test/scala/org/apache/spark/sql/DataFrameWindowSuite.scala
|
Test build #50481 has finished for PR 10963 at commit
|
|
Thanks, merging to master and 1.6. |
JIRA: https://issues.apache.org/jira/browse/SPARK-12989 In the rule `ExtractWindowExpressions`, we simply replace alias by the corresponding attribute. However, this will cause an issue exposed by the following case: ```scala val data = Seq(("a", "b", "c", 3), ("c", "b", "a", 3)).toDF("A", "B", "C", "num") .withColumn("Data", struct("A", "B", "C")) .drop("A") .drop("B") .drop("C") val winSpec = Window.partitionBy("Data.A", "Data.B").orderBy($"num".desc) data.select($"*", max("num").over(winSpec) as "max").explain(true) ``` In this case, both `Data.A` and `Data.B` are `alias` in `WindowSpecDefinition`. If we replace these alias expression by their alias names, we are unable to know what they are since they will not be put in `missingExpr` too. Author: gatorsmile <[email protected]> Author: xiaoli <[email protected]> Author: Xiao Li <[email protected]> Closes #10963 from gatorsmile/seletStarAfterColDrop. (cherry picked from commit 33c8a49) Signed-off-by: Michael Armbrust <[email protected]>
JIRA: https://issues.apache.org/jira/browse/SPARK-12989
In the rule
ExtractWindowExpressions, we simply replace alias by the corresponding attribute. However, this will cause an issue exposed by the following case:In this case, both
Data.AandData.BarealiasinWindowSpecDefinition. If we replace these alias expression by their alias names, we are unable to know what they are since they will not be put inmissingExprtoo.