Skip to content

Conversation

@maryannxue
Copy link
Contributor

@maryannxue maryannxue commented Jul 5, 2018

What changes were proposed in this pull request?

  1. Extend the Parser to enable parsing a column list as the pivot column.
  2. Extend the Parser and the Pivot node to enable parsing complex expressions with aliases as the pivot value.
  3. Add type check and constant check in Analyzer for Pivot node.

How was this patch tested?

Add tests in pivot.sql

PIVOT (
sum(e) s, avg(e) a
FOR y IN (2012, 2013)
FOR y IN (2012 as firstYear, 2013 secondYear)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we keep the original query? add a new one for this?

@SparkQA
Copy link

SparkQA commented Jul 6, 2018

Test build #92656 has finished for PR 21720 at commit 942a30d.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

struct<>
-- !query 20 output
org.apache.spark.SparkException
Exception thrown in awaitResult:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

aggregates.foreach { e =>
if (!isAggregateExpression(e)) {
throw new AnalysisException(
s"Aggregate expression required for pivot, found '$e'")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a test case for this exception?

SELECT * FROM (
  SELECT year, course, earnings FROM courseSales
)
PIVOT (
  sum(earnings), year
  FOR course IN ('dotNET', 'Java')
)

val evalPivotValues = pivotValues.map { value =>
if (!Cast.canCast(value.dataType, pivotColumn.dataType)) {
throw new AnalysisException(s"Invalid pivot value '$value': " +
s"value data type ${value.dataType.simpleString} does not match " +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

simpleString -> catalogString

try {
Cast(value, pivotColumn.dataType).eval(EmptyRow)
} catch {
case _: UnsupportedOperationException =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not use try catch for these cases.

          if (value.foldable) {
            Cast(value, pivotColumn.dataType).eval(EmptyRow)
          } else {
            throw new AnalysisException(
              s"Literal expressions required for pivot values, found '$value'")
          }

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should check if the value is foldable before the type is castable

def ifExpr(expr: Expression) = {
If(EqualNullSafe(pivotColumn, value), expr, Literal(null))
def ifExpr(e: Expression) = {
If(EqualNullSafe(pivotColumn, Cast(value, pivotColumn.dataType)), e, Literal(null))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to consider timezone. Cast(value, pivotColumn.dataType, Some(conf.sessionLocalTimeZone))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it required in the other Cast(value, pivotColumn.dataType) above?

Copy link
Member

@MaxGekk MaxGekk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


pivotColumn
: identifiers+=identifier
| '(' identifiers+=identifier (',' identifiers+=identifier)* ')'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any specific reasons to restrict the pivotColumn by identifier? Are there any cases when expressions still don't supported properly with your changes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main reason was that I implemented this pivot SQL support based on ORACLE grammar. Please take a look at https://docs.oracle.com/database/121/SQLRF/img_text/pivot_for_clause.htm. Note that the "column" here is different from "expression" (take this for reference: https://docs.oracle.com/cd/B28359_01/server.111/b28286/expressions002.htm#SQLRF52047).
Another reason was that relaxing it to an "expr" would require a lot more tests and handling of special cases.

case p: Pivot if !p.childrenResolved || !p.aggregates.forall(_.resolved)
|| (p.groupByExprsOpt.isDefined && !p.groupByExprsOpt.get.forall(_.resolved))
|| !p.pivotColumn.resolved => p
|| !p.pivotColumn.resolved || !p.pivotValues.forall(_.resolved) => p
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By which test is the change covered?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before this PR, pivot values can only be single literals (no struct) so they have been converted to Literals in ASTBuilder. Now they are "expressions" and will be handled in this Analyzer rule.

*/
case class Pivot(
groupByExprsOpt: Option[Seq[NamedExpression]],
pivotColumn: Expression,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am asking just for my understanding. If you support multiple pivot columns, why it is not declared here explicitly: pivotColumns: Seq[Expression] like for pivotValues?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. Pivot column is one "expression" which can be either 1) a single column reference or 2) a struct of multiple columns. Either way the list of pivot values are many-to-one mapping for the pivot column.

struct<>
-- !query 19 output
org.apache.spark.SparkException
Job 17 cancelled because SparkContext was shut down
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it expected output?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No... sorry about this. There must have been a mistake. I'll commit this file again.

.map(typedVisit[Expression])
val pivotColumn = UnresolvedAttribute.quoted(ctx.pivotColumn.getText)
val pivotValues = ctx.pivotValues.asScala.map(typedVisit[Expression]).map(Literal.apply)
val pivotColumn = if (ctx.pivotColumn.identifiers.size == 1) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any reasons to handle one pivot column separately? And what happens if size == 0?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cannot be "0" as required by the parser rule. if size == 1, then it's single column as before, otherwise it's a construct.

} catch {
case _: UnsupportedOperationException =>
throw new AnalysisException(
s"Literal expressions required for pivot values, found '$value'")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is UnsupportedOperationException raised only in the case if value is not a literal. Probably you can check that it is a literal earlier?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right. Please refer to @gatorsmile's comment.

@SparkQA
Copy link

SparkQA commented Jul 10, 2018

Test build #92792 has finished for PR 21720 at commit d468821.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maryannxue
Copy link
Contributor Author

retest please

@maryannxue
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Jul 10, 2018

Test build #92823 has finished for PR 21720 at commit d468821.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

ping @maryannxue Resolve the conflicts? Will review it again after that.

@SparkQA
Copy link

SparkQA commented Jul 14, 2018

Test build #92993 has finished for PR 21720 at commit b27245e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Jul 16, 2018

Test build #93138 has finished for PR 21720 at commit b27245e.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maryannxue
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Jul 17, 2018

Test build #93152 has finished for PR 21720 at commit b27245e.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maryannxue
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Jul 17, 2018

Test build #93182 has finished for PR 21720 at commit b27245e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

LGTM

Thanks! Merged to master.

@asfgit asfgit closed this in cd203e0 Jul 18, 2018
@MaxGekk
Copy link
Member

MaxGekk commented Jul 18, 2018

@gatorsmile @maryannxue Can we move forward with this PR: #21699 ?

@patricker
Copy link

@maryannxue I know this is an old PR, but it doesn't actually include SPARK-24163. Can the Jira ticket be re-opened for SPARK-24163?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants