-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-30184][SQL] Implement a helper method for aliasing functions #26808
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
ok to test |
|
Test build #115054 has finished for PR 26808 at commit
|
| val result = AggregateExpression( | ||
| aggregate.First(evalWithinGroup(regularGroupId, operator.toAttribute), Literal(true)), | ||
| aggregate.First("first", | ||
| evalWithinGroup(regularGroupId, operator.toAttribute), Literal(true)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a bit verbose to always set "first" for First, so how about defining auxiliary constructor?
def this(child: Expression, ignoreNullsExpr: Expression) =
this("first", child, Literal.create(false, BooleanType))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the tests pass, LGTM except for one minor comment
|
btw, why did you include fixes for Average and ApproximatePercentile in this pr? |
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
Outdated
Show resolved
Hide resolved
| */ | ||
| def first(e: Column, ignoreNulls: Boolean): Column = withAggregateFunction { | ||
| new First(e.expr, Literal(ignoreNulls)) | ||
| new First("first", e.expr, Literal(ignoreNulls)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can add a constructor to provide a default name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala Lines 303 to 304 in be867e8
|
|
Test build #115081 has finished for PR 26808 at commit
|
|
Test build #115082 has finished for PR 26808 at commit
|
| // Select the result of the first aggregate in the last aggregate. | ||
| val result = AggregateExpression( | ||
| aggregate.First(evalWithinGroup(regularGroupId, operator.toAttribute), Literal(true)), | ||
| new aggregate.First(evalWithinGroup(regularGroupId, operator.toAttribute), Literal(true)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so adding more constructors doesn't help us to reduce diff, how about
case class First(funcName: String, child: Expression, ignoreNullsExpr: Expression) {
def this(funcName: String, child: Expression)
}
object First {
def apply(child: Expression, ignoreNullsExpr: Expression) ...
def apply(child: Expression) ...
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
Test build #115106 has finished for PR 26808 at commit
|
|
Test build #115109 has finished for PR 26808 at commit
|
| // 1. Maven can't get correct resource directory when resources in other jars. | ||
| // 2. We test subclasses in the hive-thriftserver module. | ||
| val sparkHome = { | ||
| sys.props.put("spark.test.home", "/home/root1/spark") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
opps...I'll revert this
maropu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM except for one comment.
|
Test build #115142 has finished for PR 26808 at commit
|
|
Test build #115141 has finished for PR 26808 at commit
|
| val evaluator = DeclarativeAggregateEvaluator(Last(input, Literal(false)), Seq(input)) | ||
| val evaluatorIgnoreNulls = DeclarativeAggregateEvaluator(Last(input, Literal(true)), Seq(input)) | ||
| val evaluatorIgnoreNulls = DeclarativeAggregateEvaluator( | ||
| Last(input, Literal(true)), Seq(input)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: unnecessary change
|
Test build #115524 has finished for PR 26808 at commit
|
|
Test build #115526 has finished for PR 26808 at commit
|
|
Test build #115528 has finished for PR 26808 at commit
|
|
|
||
| /** See usage above. */ | ||
| private def expression[T <: Expression](name: String) | ||
| private def expression[T <: Expression](name: String, isAliasName: Boolean = false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: setAlias: Boolean = false
| throw new UnsupportedOperationException(s"$nodeName does not implement simpleStringWithNodeId") | ||
| } | ||
|
|
||
| val FUNC_ALIAS = TreeNodeTag[String]("functionAliasName") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can define it in object FunctionRegistry. This is kind of a map key, and we don't need to create a different instance for each expression.
| expression[HyperLogLogPlusPlus]("approx_count_distinct"), | ||
| expression[Average]("avg"), | ||
| expressionWithAlias[Average]("avg"), | ||
| expressionWithAlias[Average]("mean"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems we don't need expressionWithAlias at all, just call expression[Average]("mean", setAlias = true).
And expression[Average]("avg") can remain unchanged, as avg is already the default name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And we don't need to reorder them now.
| } | ||
|
|
||
| override def prettyName: String = "percentile_approx" | ||
| override def nodeName: String = getTagValue(FUNC_ALIAS).getOrElse("percentile_approx") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's override prettyName as before.
| override lazy val evaluateExpression: AttributeReference = first | ||
|
|
||
| override def toString: String = s"first($child)${if (ignoreNulls) " ignore nulls"}" | ||
| override def nodeName: String = getTagValue(FUNC_ALIAS).getOrElse("first") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto, override prettyName seems better.
|
in general looks good, thanks for updating! |
|
cc @cloud-fan |
|
Test build #115559 has finished for PR 26808 at commit
|
|
Tests have passed. cc @cloud-fan @maropu |
|
thanks, merging to master! |
…n SQL expressions ### What changes were proposed in this pull request? This PR is kind of a followup of #26808. It leverages the helper method for aliasing in built-in SQL expressions to use the alias as its output column name where it's applicable. - `Expression`, `UnaryMathExpression` and `BinaryMathExpression` search the alias in the tags by default. - When the naming is different in its implementation, it has to be overwritten for the expression specifically. E.g., `CallMethodViaReflection`, `Remainder`, `CurrentTimestamp`, `FormatString` and `XPathDouble`. This PR fixes the aliases of the functions below: | class | alias | |--------------------------|------------------| |`Rand` |`random` | |`Ceil` |`ceiling` | |`Remainder` |`mod` | |`Pow` |`pow` | |`Signum` |`sign` | |`Chr` |`char` | |`Length` |`char_length` | |`Length` |`character_length`| |`FormatString` |`printf` | |`Substring` |`substr` | |`Upper` |`ucase` | |`XPathDouble` |`xpath_number` | |`DayOfMonth` |`day` | |`CurrentTimestamp` |`now` | |`Size` |`cardinality` | |`Sha1` |`sha` | |`CallMethodViaReflection` |`java_method` | Note: `EqualTo`, `=` and `==` aliases were excluded because it's unable to leverage this helper method. It should fix the parser. Note: this PR also excludes some instances such as `ToDegrees`, `ToRadians`, `UnaryMinus` and `UnaryPositive` that needs an explicit name overwritten to make the scope of this PR smaller. ### Why are the changes needed? To respect expression name. ### Does this PR introduce any user-facing change? Yes, it will change the output column name. ### How was this patch tested? Manually tested, and unittests were added. Closes #27901 from HyukjinKwon/31146. Authored-by: HyukjinKwon <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
…n SQL expressions ### What changes were proposed in this pull request? This PR is kind of a followup of #26808. It leverages the helper method for aliasing in built-in SQL expressions to use the alias as its output column name where it's applicable. - `Expression`, `UnaryMathExpression` and `BinaryMathExpression` search the alias in the tags by default. - When the naming is different in its implementation, it has to be overwritten for the expression specifically. E.g., `CallMethodViaReflection`, `Remainder`, `CurrentTimestamp`, `FormatString` and `XPathDouble`. This PR fixes the aliases of the functions below: | class | alias | |--------------------------|------------------| |`Rand` |`random` | |`Ceil` |`ceiling` | |`Remainder` |`mod` | |`Pow` |`pow` | |`Signum` |`sign` | |`Chr` |`char` | |`Length` |`char_length` | |`Length` |`character_length`| |`FormatString` |`printf` | |`Substring` |`substr` | |`Upper` |`ucase` | |`XPathDouble` |`xpath_number` | |`DayOfMonth` |`day` | |`CurrentTimestamp` |`now` | |`Size` |`cardinality` | |`Sha1` |`sha` | |`CallMethodViaReflection` |`java_method` | Note: `EqualTo`, `=` and `==` aliases were excluded because it's unable to leverage this helper method. It should fix the parser. Note: this PR also excludes some instances such as `ToDegrees`, `ToRadians`, `UnaryMinus` and `UnaryPositive` that needs an explicit name overwritten to make the scope of this PR smaller. ### Why are the changes needed? To respect expression name. ### Does this PR introduce any user-facing change? Yes, it will change the output column name. ### How was this patch tested? Manually tested, and unittests were added. Closes #27901 from HyukjinKwon/31146. Authored-by: HyukjinKwon <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit 6704103) Signed-off-by: Dongjoon Hyun <[email protected]>
…n SQL expressions ### What changes were proposed in this pull request? This PR is kind of a followup of apache#26808. It leverages the helper method for aliasing in built-in SQL expressions to use the alias as its output column name where it's applicable. - `Expression`, `UnaryMathExpression` and `BinaryMathExpression` search the alias in the tags by default. - When the naming is different in its implementation, it has to be overwritten for the expression specifically. E.g., `CallMethodViaReflection`, `Remainder`, `CurrentTimestamp`, `FormatString` and `XPathDouble`. This PR fixes the aliases of the functions below: | class | alias | |--------------------------|------------------| |`Rand` |`random` | |`Ceil` |`ceiling` | |`Remainder` |`mod` | |`Pow` |`pow` | |`Signum` |`sign` | |`Chr` |`char` | |`Length` |`char_length` | |`Length` |`character_length`| |`FormatString` |`printf` | |`Substring` |`substr` | |`Upper` |`ucase` | |`XPathDouble` |`xpath_number` | |`DayOfMonth` |`day` | |`CurrentTimestamp` |`now` | |`Size` |`cardinality` | |`Sha1` |`sha` | |`CallMethodViaReflection` |`java_method` | Note: `EqualTo`, `=` and `==` aliases were excluded because it's unable to leverage this helper method. It should fix the parser. Note: this PR also excludes some instances such as `ToDegrees`, `ToRadians`, `UnaryMinus` and `UnaryPositive` that needs an explicit name overwritten to make the scope of this PR smaller. ### Why are the changes needed? To respect expression name. ### Does this PR introduce any user-facing change? Yes, it will change the output column name. ### How was this patch tested? Manually tested, and unittests were added. Closes apache#27901 from HyukjinKwon/31146. Authored-by: HyukjinKwon <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
What changes were proposed in this pull request?
This PR is to use
expressionWithAliasfor remaining functions for which alias name can be used. Remaining functions are:Average, First, Last, ApproximatePercentile, StddevSamp, VarianceSampPR #26712 introduced
expressionWithAliasWhy are the changes needed?
Error message is wrong when alias name is used for above mentioned functions.
Does this PR introduce any user-facing change?
No
How was this patch tested?
Manually