-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-29883][SQL] Implement a helper method for aliasing bool_and() and bool_or() #26712
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@gatorsmile I have now handled for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, do functions not already have names somewhere to use, that can already be set differently per alias? it looks like that's what nodeName is for, and it's already overridden in the aliases, so I'm missing why this is different.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do functions not already have names somewhere to use, that can already be set differently per alias?
I think, No? Currently, when we use alias of bool_and (i.e, every), it will be resolved as a constructor of BoolAnd using FunctionRegistry#expressions, which sets `bool_and' as a nodeName.
Lines 55 to 57 in 9351e3e
| case class BoolAnd(arg: Expression) extends UnevaluableBooleanAggBase(arg) { | |
| override def nodeName: String = "bool_and" | |
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may not make sense, I don't know this code well, but: is it not that "Exists" needs to customize its nodeName, for example? or does it never exist as such a node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
every here does not have any node. It will be resolved as a BoolAnd.
spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
Lines 316 to 320 in 32af700
| expression[BoolAnd]("every"), | |
| expression[BoolAnd]("bool_and"), | |
| expression[BoolOr]("any"), | |
| expression[BoolOr]("some"), | |
| expression[BoolOr]("bool_or"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. I don't know enough to evaluate the effect of changing nodeName for all implementations, which seems like a broader change than required, but it does make some sense. Maybe @liancheng or @yhuai has an opinion.
|
@gatorsmile Kindly review this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bad practice to make the Expression mutable for trivial things like this. How about using RuntimeReplaceable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Argument's data types are matched in Analysis phase of planning and its optimizer's task to replace RuntimeReplaceable, correct me if I'm wrong.
So, optimization rules (ReplaceExpressions) will never be applied on queries like select every('true').
spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
Line 130 in 85cb388
| ReplaceExpressions, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about
case class BoolAnd(functionName: String, arg: Expression) ... with MultiNamesFunction {
def nodeName = functionName
}
We can update FunctionRegistry.expression to detect MultiNamesFunction and pass name to the constructor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated PR. cc @cloud-fan
|
ok to test |
| val params = Seq.fill(expressions.size)(classOf[Expression]) | ||
| val f = constructors.find(_.getParameterTypes.toSeq == params).getOrElse { | ||
| val f = constructors.find(e => e.getParameterTypes.toSeq == params | ||
| || e.getParameterTypes.head == classOf[String]).getOrElse { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like it's less hacky to create a new expressionWithAlias method, with only the necessary logic
def expressionWithAlias ... = {
val constructors = tag.runtimeClass.getConstructors
.filter(c => e.getParameterTypes.head == classOf[String])
assert(constructors.length == 1)
try {
constructors.head.newInstance(name, expressions : _*).asInstanceOf[Expression]
} ...
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then we don't even need the MultiNamedExpression trait. We just need to register bool_and, bool_or with expressionWithAlias
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cloud-fan updated as per your suggestions.
|
Test build #114761 has finished for PR 26712 at commit
|
| .filter(_.getParameterTypes.head == classOf[String]) | ||
| assert(constructors.length == 1) | ||
| val builder = (expressions: Seq[Expression]) => { | ||
| Try(constructors.head.newInstance(name.toString, expressions.head).asInstanceOf[Expression]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems better to use the normal try catch?
try {
constructors.head.newInstance(name.toString, expressions.head).asInstanceOf[Expression]
} catch {
// the original comment ...
case e => throw new AnalysisException(e.getCause.getMessage)
}
We can update def expression as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated in latest commit.
|
Test build #114764 has finished for PR 26712 at commit
|
|
Test build #114786 has finished for PR 26712 at commit
|
|
Test build #114787 has finished for PR 26712 at commit
|
|
Test build #114797 has finished for PR 26712 at commit
|
HyukjinKwon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me too if tests pass
|
Test build #114828 has finished for PR 26712 at commit
|
|
retest this please |
|
Test build #114842 has finished for PR 26712 at commit
|
|
Test build #114875 has finished for PR 26712 at commit
|
| assert(constructors.length == 1) | ||
| val builder = (expressions: Seq[Expression]) => { | ||
| try { | ||
| constructors.head.newInstance(name.toString, expressions.head).asInstanceOf[Expression] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since, we are not validating arguments, queries like
SELECT EVERY(true, false); will result true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can validate arguments with assert or as used in expression?
cc @cloud-fan @HyukjinKwon
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how is it done in def expression?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
Lines 602 to 620 in ebd83a5
| val params = Seq.fill(expressions.size)(classOf[Expression]) | |
| val f = constructors.find(_.getParameterTypes.toSeq == params).getOrElse { | |
| val validParametersCount = constructors | |
| .filter(_.getParameterTypes.forall(_ == classOf[Expression])) | |
| .map(_.getParameterCount).distinct.sorted | |
| val invalidArgumentsMsg = if (validParametersCount.length == 0) { | |
| s"Invalid arguments for function $name" | |
| } else { | |
| val expectedNumberOfParameters = if (validParametersCount.length == 1) { | |
| validParametersCount.head.toString | |
| } else { | |
| validParametersCount.init.mkString("one of ", ", ", " and ") + | |
| validParametersCount.last | |
| } | |
| s"Invalid number of arguments for function $name. " + | |
| s"Expected: $expectedNumberOfParameters; Found: ${params.length}" | |
| } | |
| throw new AnalysisException(invalidArgumentsMsg) | |
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since, in expressionWithAlias we are always passing expressions.head to function's constructor. We can use assert statement
...
val builder = (expressions: Seq[Expression]) => {
assert(expressions.size == 1,
s"Invalid number of arguments for function $name. " +
s"Expected: 1; Found: ${expressions.size}")
assert(expressions.head == classOf[Expression],
s"Invalid arguments for function $name")
try {
constructors.head.newInstance(name.toString, expressions.head).asInstanceOf[Expression]
}
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW is it possible to do newInstance(name.toString, expressions: _*)? Then it can work for other expressions that take more than 1 parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it possible to do newInstance(name.toString, expressions: _*)?
No, compilation error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about newInstance((name.toString +: expressions): _*)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It works. Updated in latest commit. cc @cloud-fan
|
There are other function with alias name example |
I'm fine either way. |
|
Ok, I will handle them in different PR. |
|
Test build #114899 has finished for PR 26712 at commit
|
|
Test build #114903 has finished for PR 26712 at commit
|
|
cc @cloud-fan |
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
Show resolved
Hide resolved
|
cc @cloud-fan |
|
Test build #114940 has finished for PR 26712 at commit
|
|
I'm OK with it if @cloud-fan is. |
| case Failure(e) => | ||
| // the exception is an invocation exception. To get a meaningful message, we need the | ||
| // cause. | ||
| throw new AnalysisException(e.getCause.getMessage) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @amanomer . I'm wondering if this change is required in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This reformatting of try-catch block can be raised in different PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW I think this is fine and cleaner, so think it's OK to change here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there are similar try-catch block format on other files too, which can be reformatted like this.
| case Failure(e) => | ||
| // the exception is an invocation exception. To get a meaningful message, we need the | ||
| // cause. | ||
| throw new AnalysisException(e.getCause.getMessage) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto.
maropu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you make the title clearer like Implement a helper method for aliasing registered functions?
|
thanks, merging to master! |
|
Thanks all for reviewing and merging |
### What changes were proposed in this pull request? This PR is to use `expressionWithAlias` for remaining functions for which alias name can be used. Remaining functions are: `Average, First, Last, ApproximatePercentile, StddevSamp, VarianceSamp` PR #26712 introduced `expressionWithAlias` ### Why are the changes needed? Error message is wrong when alias name is used for above mentioned functions. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Manually Closes #26808 from amanomer/fncAlias. Lead-authored-by: Aman Omer <[email protected]> Co-authored-by: Aman Omer <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
What changes were proposed in this pull request?
This PR introduces a method
expressionWithAliasin classFunctionRegistrywhich is used to register function's constructor. Currently,expressionWithAliasis used to registerBoolAnd&BoolOr.Why are the changes needed?
Error message is wrong when alias name is used for
BoolAnd&BoolOr.Does this PR introduce any user-facing change?
No
How was this patch tested?
Tested manually.
For query,
select every('true');Output before this PR,
After this PR,