Skip to content

Conversation

@maryannxue
Copy link
Contributor

What changes were proposed in this pull request?

Relax the check to allow complex aggregate expressions, like ceil(sum(col1)) or sum(col1) + 1, which roughly means any aggregate expression that could appear in an Aggregate plan except pandas UDF (due to the fact that it is not supported in pivot yet).

How was this patch tested?

Added 2 tests in pivot.sql

case AggregateExpression(_, _, _, _) => true
case _ => false
}
// TODO: Support Pandas UDF.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a comment about the check and explain what is allowed?

@SparkQA
Copy link

SparkQA commented Jul 12, 2018

Test build #92901 has finished for PR 21753 at commit 93d2bde.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member

maropu commented Jul 12, 2018

Actually, this is a bug? In the current master, the exception says;

org.apache.spark.sql.AnalysisException
Aggregate expression required for pivot, found 'CEIL(sum(cast(earnings#x as bigint)))';

sum is a aggregation function though, the exception says it is not?

struct<>
-- !query 14 output
org.apache.spark.sql.AnalysisException
It is not allowed to use an aggregate function in the argument of another aggregate function. Please use the inner aggregate function in a sub-query.;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is related to this pr? I think the output does not change with/without this pr.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. I think it's still worth adding such a test for pivot.
But you reminded me that I might not need to check the aggregate function arguments here and leave it to CheckAnalysis since this check is independent of the context and always outputs the same error message. WDYT, @maropu and @gatorsmile ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding this test is just to improve the test coverage. It looks reasonable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But you reminded me that I might not need to check the aggregate function arguments here and leave it to CheckAnalysis since this check is independent of the context and always outputs the same error message.

The general principle in our Analyzer is do the error handling in CheckAnalysis, unless a better (more readable) error message can be issued from the rule.

@maryannxue
Copy link
Contributor Author

@maropu It's not a bug. It works as I specified in my original PR, and you can also refer to https://docs.oracle.com/database/121/SQLRF/img_text/pivot_clause.htm, which only allows a form of "agg_func(expr)". ceil(sum(...) is an aggregate expression, but not in that form.

@SparkQA
Copy link

SparkQA commented Jul 12, 2018

Test build #92941 has finished for PR 21753 at commit e5d4384.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

// TODO: Support Pandas UDF.
private def checkValidAggregateExpression(expr: Expression): Unit = expr match {
case _: AggregateExpression => // OK and leave the argument check to CheckAnalysis.
case expr: PythonUDF if PythonUDF.isGroupedAggPandasUDF(expr) =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created a JIRA for this support https://issues.apache.org/jira/browse/SPARK-24796

@gatorsmile
Copy link
Member

LGTM

Thanks! Merged to master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants