-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-6145][SQL] fix ORDER BY on nested fields #4904
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Can one of the admins verify this patch? |
|
Can you also try the test case: The root cause of the failure also due to the incorrect alias, see: |
|
Hi @chenghao-intel , your test is failed on my code, and I study into it, here is my thoughts. |
|
Why not review #4892 for me? :) |
|
test this please |
|
Test build #28305 has started for PR 4904 at commit
|
|
Test build #28305 has finished for PR 4904 at commit
|
|
Test FAILed. |
|
Since I'd really like to include this in the next RC, I've opened #4918 with the style error fixed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather use checkAnswer for these tests, especially if we are going to put them in SQLQuerySuite. When check answer fails it'll give nice exceptions and then we test end to end.
Based on #4904 with style errors fixed. `LogicalPlan#resolve` will not only produce `Attribute`, but also "`GetField` chain". So in `ResolveSortReferences`, after resolve the ordering expressions, we should not just collect the `Attribute` results, but also `Attribute` at the bottom of "`GetField` chain". Author: Wenchen Fan <[email protected]> Author: Michael Armbrust <[email protected]> Closes #4918 from marmbrus/pr/4904 and squashes the following commits: 997f84e [Michael Armbrust] fix style 3eedbfc [Wenchen Fan] fix 6145 (cherry picked from commit 5873c71) Signed-off-by: Michael Armbrust <[email protected]>
Based on #4904 with style errors fixed. `LogicalPlan#resolve` will not only produce `Attribute`, but also "`GetField` chain". So in `ResolveSortReferences`, after resolve the ordering expressions, we should not just collect the `Attribute` results, but also `Attribute` at the bottom of "`GetField` chain". Author: Wenchen Fan <[email protected]> Author: Michael Armbrust <[email protected]> Closes #4918 from marmbrus/pr/4904 and squashes the following commits: 997f84e [Michael Armbrust] fix style 3eedbfc [Wenchen Fan] fix 6145
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just filter out everything except AttributeReference? I don't know all corner cases of ORDER BY and feel this way is safer.
|
retest it please. |
|
Seems Jenkins doesn't listen to me :( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels kind of hacky to me as its mixing the particulars of analysis for SQL into the logical plan. Could we instead just make resolve not do partial resolution, where we can resolve the base attribute but not the GetFields that are on top. I think this change is the root cause of the regression.
|
ok to test |
|
Test build #28354 has started for PR 4904 at commit
|
|
Test build #28354 has finished for PR 4904 at commit
|
|
Test FAILed. |
|
Hi @marmbrus , it feels hard for me to resolve the base attribute but not the GetFields that are on top. When we get into |
|
Test build #28372 has started for PR 4904 at commit
|
|
Test build #28372 has finished for PR 4904 at commit
|
|
Test PASSed. |
|
Test build #28634 has started for PR 4904 at commit
|
|
Test build #28634 has finished for PR 4904 at commit
|
|
Test PASSed. |
|
Hi @marmbrus , any more comments? |
|
ping @marmbrus |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it will not work when we have a order by in a subquery and the outer query block try to access the those filtered out fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. You are creating a fake Sort node in order to resolve the attribute. If the attribute is resolved, use the resolved one to replace the unresolved one.
|
Thanks for working on this! This is a pretty tricky case that involves several rules interacting, and your test cases have been incredibly helpful. Here are my thoughts on the proposed solution:
I've taken your changes and made an alternative solution (#5189). Would appreciate your feedback there. |
This PR is based on work by cloud-fan in #4904, but with two differences: - We isolate the logic for Sort's special handling into `ResolveSortReferences` - We avoid creating UnresolvedGetField expressions during resolution. Instead we either resolve GetField or we return None. This avoids us going down the wrong path early on. Author: Michael Armbrust <[email protected]> Closes #5189 from marmbrus/nestedOrderBy and squashes the following commits: b8cae45 [Michael Armbrust] fix another test 0f36a11 [Michael Armbrust] WIP 91820cd [Michael Armbrust] Fix bug.
This PR is based on work by cloud-fan in #4904, but with two differences: - We isolate the logic for Sort's special handling into `ResolveSortReferences` - We avoid creating UnresolvedGetField expressions during resolution. Instead we either resolve GetField or we return None. This avoids us going down the wrong path early on. Author: Michael Armbrust <[email protected]> Closes #5189 from marmbrus/nestedOrderBy and squashes the following commits: b8cae45 [Michael Armbrust] fix another test 0f36a11 [Michael Armbrust] WIP 91820cd [Michael Armbrust] Fix bug. (cherry picked from commit cd48ca5) Signed-off-by: Michael Armbrust <[email protected]> Conflicts: sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala
When resolve
SortwithProject's output, we should use the base attribute but not the GetField chain inProject's output.