Skip to content

Conversation

@panbingkun
Copy link
Contributor

@panbingkun panbingkun commented Feb 9, 2023

What changes were proposed in this pull request?

The pr aim to rewrite HiveGenericUDF with Invoke.
Follow by #39555

Why are the changes needed?

With the help of RuntimeReplaceable, we don't have to manually implement the codegen logic.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Add new UT.
Pass GA.

@LuciferYang
Copy link
Contributor

@panbingkun
Copy link
Contributor Author

After the pr, I will continue #39865,
According to the above ideas, RewriteHiveSimpleUDF with Invoke is also ok.
image

val refTerm = ctx.addReferenceObj("this", this)
val childrenEvals = children.map(_.genCode(ctx))

val setDeferredObjects = childrenEvals.zipWithIndex.map {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

codegen is performance critical. Previously, we generate code to directly set values for deferredObjects, but in this PR we always create a Object[] to wrap the arguments, which can be bad for performance.

This is a signal that Invoke doesn't work very well for Hive UDF. It's still valuable to have something like HiveGenericUDFInvokeAdapter to share code between interpreted code path and codegen code path.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan
Should I continue to rewrite HiveSimpleUDF with Invoke
Or the pr #39865 is ok?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should refactor HiveGenericUDF first, then follow it to implement codegen of HiveSimpleUDF.

What the refactor should do is to add something like HiveGenericUDFInvokeAdapter in this PR, to keep all the states. Then HiveGenericUDF just manipulates this stateful object in both interpreted code path and codegen.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, Let me try to do it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have submitted a new pr: #40394 to refactor HiveGenericUDF.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow it to implement codegen of HiveSimpleUDF: #40397

cloud-fan pushed a commit that referenced this pull request Mar 15, 2023
### What changes were proposed in this pull request?
The pr aims to refactor HiveGenericUDF.

### Why are the changes needed?
Following #39949.
Make the code more concise.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Pass GA.

Closes #40394 from panbingkun/refactor_HiveGenericUDF.

Lead-authored-by: panbingkun <[email protected]>
Co-authored-by: panbingkun <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
cloud-fan pushed a commit that referenced this pull request Mar 22, 2023
### What changes were proposed in this pull request?
- As a subtask of [SPARK-42050](https://issues.apache.org/jira/browse/SPARK-42050), this PR adds Codegen Support for HiveSimpleUDF
- Extract a`HiveUDFEvaluatorBase` class for the common behaviors of HiveSimpleUDFEvaluator & HiveGenericUDFEvaluator.

### Why are the changes needed?
- Improve codegen coverage and performance.
- Following #39949. Make the code more concise.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Add new UT.
Pass GA.

Closes #40397 from panbingkun/refactor_HiveSimpleUDF.

Authored-by: panbingkun <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
@panbingkun panbingkun closed this May 14, 2023
wangyum pushed a commit that referenced this pull request May 26, 2023
…pleUDF (#1288)

### What changes were proposed in this pull request?
- As a subtask of [SPARK-42050](https://issues.apache.org/jira/browse/SPARK-42050), this PR adds Codegen Support for HiveSimpleUDF
- Extract a`HiveUDFEvaluatorBase` class for the common behaviors of HiveSimpleUDFEvaluator & HiveGenericUDFEvaluator.

### Why are the changes needed?
- Improve codegen coverage and performance.
- Following #39949. Make the code more concise.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Add new UT.
Pass GA.

Closes #40397 from panbingkun/refactor_HiveSimpleUDF.

Authored-by: panbingkun <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants