-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-42386][SQL] Rewrite HiveGenericUDF with Invoke #39949
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
After the pr, I will continue #39865, |
| val refTerm = ctx.addReferenceObj("this", this) | ||
| val childrenEvals = children.map(_.genCode(ctx)) | ||
|
|
||
| val setDeferredObjects = childrenEvals.zipWithIndex.map { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
codegen is performance critical. Previously, we generate code to directly set values for deferredObjects, but in this PR we always create a Object[] to wrap the arguments, which can be bad for performance.
This is a signal that Invoke doesn't work very well for Hive UDF. It's still valuable to have something like HiveGenericUDFInvokeAdapter to share code between interpreted code path and codegen code path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cloud-fan
Should I continue to rewrite HiveSimpleUDF with Invoke
Or the pr #39865 is ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should refactor HiveGenericUDF first, then follow it to implement codegen of HiveSimpleUDF.
What the refactor should do is to add something like HiveGenericUDFInvokeAdapter in this PR, to keep all the states. Then HiveGenericUDF just manipulates this stateful object in both interpreted code path and codegen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, Let me try to do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have submitted a new pr: #40394 to refactor HiveGenericUDF.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Follow it to implement codegen of HiveSimpleUDF: #40397
### What changes were proposed in this pull request? The pr aims to refactor HiveGenericUDF. ### Why are the changes needed? Following #39949. Make the code more concise. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass GA. Closes #40394 from panbingkun/refactor_HiveGenericUDF. Lead-authored-by: panbingkun <[email protected]> Co-authored-by: panbingkun <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
### What changes were proposed in this pull request? - As a subtask of [SPARK-42050](https://issues.apache.org/jira/browse/SPARK-42050), this PR adds Codegen Support for HiveSimpleUDF - Extract a`HiveUDFEvaluatorBase` class for the common behaviors of HiveSimpleUDFEvaluator & HiveGenericUDFEvaluator. ### Why are the changes needed? - Improve codegen coverage and performance. - Following #39949. Make the code more concise. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Add new UT. Pass GA. Closes #40397 from panbingkun/refactor_HiveSimpleUDF. Authored-by: panbingkun <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
…pleUDF (#1288) ### What changes were proposed in this pull request? - As a subtask of [SPARK-42050](https://issues.apache.org/jira/browse/SPARK-42050), this PR adds Codegen Support for HiveSimpleUDF - Extract a`HiveUDFEvaluatorBase` class for the common behaviors of HiveSimpleUDFEvaluator & HiveGenericUDFEvaluator. ### Why are the changes needed? - Improve codegen coverage and performance. - Following #39949. Make the code more concise. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Add new UT. Pass GA. Closes #40397 from panbingkun/refactor_HiveSimpleUDF. Authored-by: panbingkun <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>

What changes were proposed in this pull request?
The pr aim to rewrite HiveGenericUDF with Invoke.
Follow by #39555
Why are the changes needed?
With the help of
RuntimeReplaceable, we don't have to manually implement the codegen logic.Does this PR introduce any user-facing change?
No.
How was this patch tested?
Add new UT.
Pass GA.