-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-44913][SQL] DS V2 supports push down V2 UDF that has magic method #42612
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Hi, @cloud-fan @rdblue @beliefer @LuciferYang could you help to review this? Thanks a lot. |
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/V2ExpressionUtils.scala
Outdated
Show resolved
Hide resolved
|
Gentle ping @cloud-fan @sunchao, could you please also take a look at this? Really appreciate. |
|
Apologies @ConeyLiu , just saw this PR. I think this makes sense. Could you rebase it? I'll review afterwards. |
c6f0a84 to
50f09cc
Compare
|
@sunchao that's OK, rebased. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm instead of adding these two parameters, can we instead check staticObject and functionName?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, we can check if staticObject is a subclass of BoundFunction
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For V2 UDF, the staticObject here is the class of the BoundFunction, and the functionName is the magic name: invoke. However, we need the name and canonicalName to build the UserDefinedScalarFunc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can instantiate staticObject and call its canonicalName function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems is not easy. The BoundFunction is created by calling the bind method of UnboundFunction which is loaded by FunctionCatalog. And we have no guarantee the BoundFunction has no-args constructor to create with reflect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better to reuse StaticInoke so that any optimizations for it can still apply. How about we add one more parameter function: Option[ScalarFunction[_]] to it instead of two strings?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking extending StaticInvoke somehow like
case class V2StaticInvoke(wrapped: StaticInvoke, name: String, canonicalName: String) extends InvokeLikebut this way we still need to override quite a few members in InvokeLike so may not worth it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about we add one more parameter function: Option[ScalarFunction[_]] to it instead of two strings?
Changed to this way. And it is the initial implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cloud-fan @sunchao Although adding ScalarFunction as a new parameter is convenient, it makes StaticInvoke look very strange.
We already call the scalarFunc.getClass, scalarFunc.resultType(), scalarFunc.isResultNullable and scalarFunc.isDeterministic before passed the new ScalarFunction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm OK to keep as it is. For me it is just a nit and one parameter makes the change less intrusive.
sunchao
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
beliefer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM too.
a7060ef to
12982ff
Compare
| * @param scalarFunction the [[ScalarFunction]] object if this is calling the magic method of the | ||
| * [[ScalarFunction]] otherwise is unset. | ||
| */ | ||
| case class StaticInvoke( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can override stringArgs in StaticInvoke, to exclude the new parameter if it's None, to avoid the golden file changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
|
Thanks! merged to master branch |
|
Thanks @sunchao @beliefer @cloud-fan |
| if (scalarFunction.nonEmpty) { | ||
| super.stringArgs | ||
| } else { | ||
| super.stringArgs.take(8) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is fragile. I think super.stringArgs.dropRight(1) is better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cloud-fan I have submitted a follow-up PR for this.
… magic method ### What changes were proposed in this pull request? This is follow up #42612 and to address the comment #42612 (comment) ### Why are the changes needed? To address comments. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing UTs. ### Was this patch authored or co-authored using generative AI tooling? No Closes #43262 from ConeyLiu/42612-followup. Authored-by: Xianyang Liu <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
What changes were proposed in this pull request?
Right now we only support pushing down the V2 UDF that has not a magic method. Because the V2 UDF will be analyzed into the
ApplyFunctionExpressionwhich could be translated and pushed down. However, a V2 UDF that has the magic method will be analyzed intoStaticInvokeorInvokethat can not be translated into V2 expression and then can not be pushed down to the data source. The magic method is suggested.Why are the changes needed?
This PR adds the support of pushing down the V2 UDF that has a magic method.
Does this PR introduce any user-facing change?
Yes, now the V2 UDF with the magic method could be pushed down.
How was this patch tested?
New UTs.
Was this patch authored or co-authored using generative AI tooling?
No