-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-24790][SQL] Allow complex aggregate expressions in Pivot #21753
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| case AggregateExpression(_, _, _, _) => true | ||
| case _ => false | ||
| } | ||
| // TODO: Support Pandas UDF. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a comment about the check and explain what is allowed?
|
Test build #92901 has finished for PR 21753 at commit
|
|
Actually, this is a bug? In the current master, the exception says;
|
| struct<> | ||
| -- !query 14 output | ||
| org.apache.spark.sql.AnalysisException | ||
| It is not allowed to use an aggregate function in the argument of another aggregate function. Please use the inner aggregate function in a sub-query.; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is related to this pr? I think the output does not change with/without this pr.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. I think it's still worth adding such a test for pivot.
But you reminded me that I might not need to check the aggregate function arguments here and leave it to CheckAnalysis since this check is independent of the context and always outputs the same error message. WDYT, @maropu and @gatorsmile ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding this test is just to improve the test coverage. It looks reasonable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But you reminded me that I might not need to check the aggregate function arguments here and leave it to CheckAnalysis since this check is independent of the context and always outputs the same error message.
The general principle in our Analyzer is do the error handling in CheckAnalysis, unless a better (more readable) error message can be issued from the rule.
|
@maropu It's not a bug. It works as I specified in my original PR, and you can also refer to https://docs.oracle.com/database/121/SQLRF/img_text/pivot_clause.htm, which only allows a form of "agg_func(expr)". |
|
Test build #92941 has finished for PR 21753 at commit
|
| // TODO: Support Pandas UDF. | ||
| private def checkValidAggregateExpression(expr: Expression): Unit = expr match { | ||
| case _: AggregateExpression => // OK and leave the argument check to CheckAnalysis. | ||
| case expr: PythonUDF if PythonUDF.isGroupedAggPandasUDF(expr) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I created a JIRA for this support https://issues.apache.org/jira/browse/SPARK-24796
|
LGTM Thanks! Merged to master. |
What changes were proposed in this pull request?
Relax the check to allow complex aggregate expressions, like
ceil(sum(col1))orsum(col1) + 1, which roughly means any aggregate expression that could appear in an Aggregate plan except pandas UDF (due to the fact that it is not supported in pivot yet).How was this patch tested?
Added 2 tests in pivot.sql