-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-13123][SQL] Implement whole state codegen for sort. #11008
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This does the simplest thing of just assembly a row on consume and driving the underlying external sorter object.
|
|
Test build #50511 has finished for PR 11008 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is pretty ghetto... (although i understand maybe it's the simplest way to implement this)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? This is the state that needs to be kept between the two member functions in this class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's ok here as discussed offline. i just found mutable state in here as a way to pass variable names through pretty brittle. maybe good to have a more general abstraction for this in codegen, but not that big of a deal right now/
|
Test build #50584 has finished for PR 11008 at commit
|
| } | ||
|
|
||
| ctx.currentVars = input | ||
| val code = GenerateUnsafeProjection.createCode(ctx, colExprs, false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the child can produce UnsafeRow (for example, Exchange), we should have a way to avoid this unpack and pack again, or we will see regression (generated version slower than non-generated).
I think we can pass the variable for input row into doCosume, could be null. It's better to do this after #11274 , then we don't need to worry about whether should we create variables for input or not.
## What changes were proposed in this pull request? This PR adds support for implementing whole state codegen for sort. Builds heaving on nongli 's PR: #11008 (which actually implements the feature), and adds the following changes on top: - [x] Generated code updates peak execution memory metrics - [x] Unit tests in `WholeStageCodegenSuite` and `SQLMetricsSuite` ## How was this patch tested? New unit tests in `WholeStageCodegenSuite` and `SQLMetricsSuite`. Further, all existing sort tests should pass. Author: Sameer Agarwal <[email protected]> Author: Nong Li <[email protected]> Closes #11359 from sameeragarwal/sort-codegen.
|
@nongli Can you close this PR? |
## What changes were proposed in this pull request? This PR adds support for implementing whole state codegen for sort. Builds heaving on nongli 's PR: apache#11008 (which actually implements the feature), and adds the following changes on top: - [x] Generated code updates peak execution memory metrics - [x] Unit tests in `WholeStageCodegenSuite` and `SQLMetricsSuite` ## How was this patch tested? New unit tests in `WholeStageCodegenSuite` and `SQLMetricsSuite`. Further, all existing sort tests should pass. Author: Sameer Agarwal <[email protected]> Author: Nong Li <[email protected]> Closes apache#11359 from sameeragarwal/sort-codegen.
This does the simplest thing of just assembly a row on consume and driving the
underlying external sorter object.