Skip to content

Conversation

@gatorsmile
Copy link
Member

What changes were proposed in this pull request?

This PR is to support group by position in SQL. For example, when users input the following query

select c1 as a, c2, c3, sum(*) from tbl group by 1, 3, c4

The ordinals are recognized as the positions in the select list. Thus, Analyzer converts it to

select c1, c2, c3, sum(*) from tbl group by c1, c3, c4

This is controlled by the config option spark.sql.groupByOrdinal.

  • When true, the ordinal numbers in group by clauses are treated as the position in the select list.
  • When false, the ordinal numbers are ignored.
  • Only convert integer literals (not foldable expressions). If found foldable expressions, ignore them.
  • When the positions specified in the group by clauses correspond to the aggregate functions in select list, output an exception message.
  • star is not allowed to use in the select list when users specify ordinals in group by

Note: This PR is taken from #10731. When merging this PR, please give the credit to @zhichao-li

Also cc all the people who are involved in the previous discussion: @rxin @cloud-fan @marmbrus @yhuai @hvanhovell @adrian-wang @chenghao-intel @tejasapatil

How was this patch tested?

Added a few test cases for both positive and negative test cases.

gatorsmile and others added 30 commits November 13, 2015 14:50
@SparkQA
Copy link

SparkQA commented Mar 22, 2016

Test build #53723 has finished for PR 11846 at commit b61345b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

# Conflicts:
#	sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
val newGroups = groups.map {
case IntegerIndex(index) if index > 0 && index <= aggs.size =>
aggs(index - 1) match {
case Alias(c, _) if c.isInstanceOf[AggregateExpression] =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about sum(a) + 1? I think we need to use TreeNode.find to check if there are any agg functions inside it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have a method called cotainsAggregate somewhere, we should call it here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uh, yeah! let me fix it and add a test case. Thanks!

@SparkQA
Copy link

SparkQA commented Mar 22, 2016

Test build #53737 has finished for PR 11846 at commit 18bab66.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 22, 2016

Test build #53746 has finished for PR 11846 at commit b19b73c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

} else {
val expanded = a.aggregateExpressions.flatMap {
case s: Star => s.expand(a.child, resolver)
case u @ UnresolvedAlias(_: Star, _) => expandStarExpression(u.child, a.child) :: Nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when will we hit this branch?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

select * from tab group by col1, col2

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But why doesn't Project have this case?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is intentionally added by CatalystQl. I can double check if this is the root cause.

https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/CatalystQl.scala#L224-L225

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After reading the code, Project still has a problem in star expansion:

      val structDf = testData2.select("a", "b").as("record")
      structDf.select(hash($"record.*"))

Sorry, the previous PR does not cover all the cases. Let me submit a separate PR to handle all the star expansion.

Of course, if we want to limit the support of star expansion in group by, we can do it for sure.

@SparkQA
Copy link

SparkQA commented Mar 23, 2016

Test build #53876 has finished for PR 11846 at commit 74a16be.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

# Conflicts:
#	sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
@SparkQA
Copy link

SparkQA commented Mar 24, 2016

Test build #54059 has finished for PR 11846 at commit a06c4ce.

  • This patch fails from timeout after a configured wait of 250m.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member Author

retest this please.

@SparkQA
Copy link

SparkQA commented Mar 24, 2016

Test build #54098 has finished for PR 11846 at commit a06c4ce.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

// which is a 1-base position of the projection list.
case s @ Sort(orders, global, child)
if conf.orderByOrdinal && orders.exists(o => IntegerIndex.unapply(o.child).nonEmpty) =>
if conf.orderByOrdinal && child.resolved &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add a case plan if !plan.childrenResolved => plan at the beginning.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, let me do it. Thanks!

Will use p instead of plan since plan causes a warning by IntelliJ compiler for possible shadowing.

@cloud-fan
Copy link
Contributor

LGTM except one minor comment, thanks for working on it!

@gatorsmile
Copy link
Member Author

Thank you for your detailed review! :-)

@gatorsmile
Copy link
Member Author

retest this please

@SparkQA
Copy link

SparkQA commented Mar 25, 2016

Test build #54138 has finished for PR 11846 at commit 6d08009.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@asfgit asfgit closed this in 05f652d Mar 25, 2016
@cloud-fan
Copy link
Contributor

Thanks, merging to master!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants