-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-7289] [SQL] Combine Limit and Sort to avoid total ordering #5821
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Merged build triggered. |
|
Merged build started. |
|
Test build #31474 has started for PR 5821 at commit |
|
Test build #31474 has finished for PR 5821 at commit
|
|
Merged build finished. Test PASSed. |
|
Test PASSed. |
|
/cc @rxin can you take a look at this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe add some explanation saying it is more efficient because it is expensive to do total ordering in distributed setting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also this rule doesn't belong in combine limits does it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My bad, after thinking more about this, i think we can not add this rule, because push down sort leads to sort global data.
But here is another case we can optimize, i am updating this.
|
Merged build triggered. |
|
Merged build started. |
|
Test build #31554 has started for PR 5821 at commit |
|
Build triggered. |
|
Build started. |
|
Merged build triggered. |
|
Merged build started. |
|
Test build #31557 has started for PR 5821 at commit |
|
Build finished. Test FAILed. |
|
Test FAILed. |
|
Merged build triggered. |
|
Merged build started. |
|
Test build #31563 has started for PR 5821 at commit |
|
Test build #31554 has finished for PR 5821 at commit
|
|
Merged build finished. Test PASSed. |
|
Test PASSed. |
|
Test build #31557 has finished for PR 5821 at commit
|
|
Merged build finished. Test PASSed. |
|
Test PASSed. |
|
Test build #31563 has finished for PR 5821 at commit
|
|
Merged build finished. Test PASSed. |
|
Test PASSed. |
|
@rxin, i have updated this PR title and description, please take a look. |
|
cc @yhuai can you take a look at this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will LimitPushdown be a better name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think CombineLimitSort is more descriptive now, since this is single and special case of combine limit and sort.
we can rename it in future when there are more push down cases we add here
|
ping @yhuai is this ok to go? |
|
Test build #786 has started for PR 5821 at commit |
|
Test build #786 has finished for PR 5821 at commit
|
|
I'm not sold on this optimization. Its pretty weird to put an |
|
yes, but we can not restrict user do not write sql like this. |
|
ok to close this |
|
cc @marmbrus actually I think we need a rule like this. @mengxr just constructed a case: |
|
Might also be caused by https://github.com/apache/spark/pull/5831/files#diff-a636a87d8843eeccca90140be91d4fafR156 |
|
@rxin, I think that it would be better to make the planning of TakeOrdered more general such that it can handle internal projections. Either that or we could also add a rule that pushes projections beneath |
Optimize following sql
select key from (select * from testData order by value) t limit 5before this PR:
after this PR
with this rule we combine limit and sort, so it will plan as takeordered which can avoid total ordering