Skip to content

Conversation

@ConeyLiu
Copy link
Contributor

What changes were proposed in this pull request?

Right now we only support pushing down the V2 UDF that has not a magic method. Because the V2 UDF will be analyzed into the ApplyFunctionExpression which could be translated and pushed down. However, a V2 UDF that has the magic method will be analyzed into StaticInvoke or Invoke that can not be translated into V2 expression and then can not be pushed down to the data source. The magic method is suggested.

Why are the changes needed?

This PR adds the support of pushing down the V2 UDF that has a magic method.

Does this PR introduce any user-facing change?

Yes, now the V2 UDF with the magic method could be pushed down.

How was this patch tested?

New UTs.

Was this patch authored or co-authored using generative AI tooling?

No

@github-actions github-actions bot added the SQL label Aug 22, 2023
@ConeyLiu
Copy link
Contributor Author

Hi, @cloud-fan @rdblue @beliefer @LuciferYang could you help to review this? Thanks a lot.

@ConeyLiu
Copy link
Contributor Author

Gentle ping @cloud-fan @sunchao, could you please also take a look at this? Really appreciate.

@sunchao
Copy link
Member

sunchao commented Sep 15, 2023

Apologies @ConeyLiu , just saw this PR. I think this makes sense. Could you rebase it? I'll review afterwards.

@ConeyLiu ConeyLiu force-pushed the push-down-udf-with-magic branch from c6f0a84 to 50f09cc Compare September 16, 2023 06:22
@ConeyLiu
Copy link
Contributor Author

@sunchao that's OK, rebased.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm instead of adding these two parameters, can we instead check staticObject and functionName?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, we can check if staticObject is a subclass of BoundFunction

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For V2 UDF, the staticObject here is the class of the BoundFunction, and the functionName is the magic name: invoke. However, we need the name and canonicalName to build the UserDefinedScalarFunc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can instantiate staticObject and call its canonicalName function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems is not easy. The BoundFunction is created by calling the bind method of UnboundFunction which is loaded by FunctionCatalog. And we have no guarantee the BoundFunction has no-args constructor to create with reflect.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to reuse StaticInoke so that any optimizations for it can still apply. How about we add one more parameter function: Option[ScalarFunction[_]] to it instead of two strings?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking extending StaticInvoke somehow like

case class V2StaticInvoke(wrapped: StaticInvoke, name: String, canonicalName: String) extends InvokeLike

but this way we still need to override quite a few members in InvokeLike so may not worth it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we add one more parameter function: Option[ScalarFunction[_]] to it instead of two strings?

Changed to this way. And it is the initial implementation.

Copy link
Contributor

@beliefer beliefer Sep 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan @sunchao Although adding ScalarFunction as a new parameter is convenient, it makes StaticInvoke look very strange.
We already call the scalarFunc.getClass, scalarFunc.resultType(), scalarFunc.isResultNullable and scalarFunc.isDeterministic before passed the new ScalarFunction.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm OK to keep as it is. For me it is just a nit and one parameter makes the change less intrusive.

Copy link
Member

@sunchao sunchao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@beliefer beliefer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM too.

@ConeyLiu ConeyLiu force-pushed the push-down-udf-with-magic branch from a7060ef to 12982ff Compare September 21, 2023 03:19
* @param scalarFunction the [[ScalarFunction]] object if this is calling the magic method of the
* [[ScalarFunction]] otherwise is unset.
*/
case class StaticInvoke(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can override stringArgs in StaticInvoke, to exclude the new parameter if it's None, to avoid the golden file changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

@github-actions github-actions bot removed the CONNECT label Sep 21, 2023
@sunchao sunchao closed this in bef11d8 Sep 29, 2023
@sunchao
Copy link
Member

sunchao commented Sep 29, 2023

Thanks! merged to master branch

@ConeyLiu
Copy link
Contributor Author

ConeyLiu commented Oct 2, 2023

Thanks @sunchao @beliefer @cloud-fan

@ConeyLiu ConeyLiu deleted the push-down-udf-with-magic branch October 2, 2023 13:40
if (scalarFunction.nonEmpty) {
super.stringArgs
} else {
super.stringArgs.take(8)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is fragile. I think super.stringArgs.dropRight(1) is better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan I have submitted a follow-up PR for this.

cloud-fan pushed a commit that referenced this pull request Oct 9, 2023
… magic method

### What changes were proposed in this pull request?

This is follow up #42612 and to address the comment #42612 (comment)

### Why are the changes needed?

To address comments.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Existing UTs.

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #43262 from ConeyLiu/42612-followup.

Authored-by: Xianyang Liu <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants