Skip to content

Conversation

@dereksabryfb
Copy link

If there is a number n in a group by clause, the nth column in the select clause is used to modify the group by clause to refer to this column instead of the number.

eg.
select a,b from c group by 1,2
becomes
select a,b from c group by a,b

@marmbrus
Copy link
Contributor

marmbrus commented Dec 1, 2015

Thanks for working on this. This seems resonable to support, but I have two suggestions:

  • I would probably implement this as a rule in the Analyzer so that it is not specific to the Hive parser.
  • Please add a unit test, probably in SQLQuerySuite

@rxin
Copy link
Contributor

rxin commented Dec 1, 2015

@dereksabryfb you should also add the email you used in your git commit to your github profile so it shows up on github.

@dereksabryfb
Copy link
Author

Thanks for your feedback! The change is now implemented as a rule in the Analyzer and a unit test has been added. Let me know if there are any other changes I need to make.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we want to match on the pretty string as I think this would also trigger for things like "1", instead I'd consider matching on a Literal of NumericType.

@marmbrus
Copy link
Contributor

marmbrus commented Dec 7, 2015

Are there other cases that we should handle here? Can you do the same thing in ORDER BY normally?

@dereksabryfb
Copy link
Author

Thanks for the feedback! I'm making the changes you suggested. With respect to the 'ORDER BY' clause, this looks to be semi-handled by the ResolveSortReferences in the Analyzer; because in standard SQL literals are allowed in the order by clause, this rule just sorts by the literal value '1'; In HiveQL, '1' is interpreted as a column in the same way it is in the group by clause.

I can add the case for a Sort() with an IntegerType. I assume there is no one currently relying on how 'sort by 1' currently functions.

@marmbrus
Copy link
Contributor

marmbrus commented Dec 8, 2015

We'll have to note the change in the release notes, but since its a no-op to sort by a constant I think we can safely change behavior here.

@dereksabryfb
Copy link
Author

Added a case for sort

@marmbrus
Copy link
Contributor

ok to test

@SparkQA
Copy link

SparkQA commented Dec 10, 2015

Test build #47534 has finished for PR 10052 at commit 8a5a4f6.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dereksabryfb
Copy link
Author

Apologies, I haven't been able to run ./dev/run-tests, getting the following exception: http://pastebin.com/L0p0sjtJ

so I wasn't able to pick up the style issues, and I'm not sure if there's more that the build doesn't flag.

@marmbrus
Copy link
Contributor

I'd try the following locally build/sbt scalastyle test:scalastyle catalyst/test sql/test.

Each of those commands can be run separately too and you can use ~ to rerun whenever something changes to iterate more quickly build/sbt ~scalastyle

@SparkQA
Copy link

SparkQA commented Dec 10, 2015

Test build #47538 has finished for PR 10052 at commit bd453d5.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dereksabryfb
Copy link
Author

Looks like it fails on the query "SELECT a, count(2) FROM testData2 GROUP BY a, 2" because 2 now refers to the column count(2) which is not valid syntax for a group by clause in hive. How should I go about resolving this? The purpose of the test is to see that literals in the group by clause don't modify the results, but the purpose of this patch is to do the opposite of that. I could modify the offending case, but I feel the whole test may be irrelevant with this patch.

@marmbrus
Copy link
Contributor

We should probably throw an AnalysisException for this if they use a column ordinal that refers to an aggregate expression. The fact that we make it all the way to the execution is pretty confusing to a user.

Regarding the test, we can probably remove it.

@dereksabryfb
Copy link
Author

I removed the offending test.

I found that there was no AnalysisException thrown even if there was an explicit aggregate in the group by clause (e.g. select a from b group by count(a)), and it would fail in the same way, so I added the check to CheckAnalysis ; if you think this is out of the scope of this pull request (since it isn't strictly to do with a number reference), I can create a new task and attach just that commit to it.

Thanks again for your feedback.

@SparkQA
Copy link

SparkQA commented Dec 11, 2015

Test build #47566 has finished for PR 10052 at commit 09b3f77.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 11, 2015

Test build #47565 has finished for PR 10052 at commit e4edc31.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

case Literal(index: Int) => is easier. It also eliminates the need for group.toString.toInt

@hvanhovell
Copy link
Contributor

@dereksabryfb are you still working on this?

@srowen
Copy link
Member

srowen commented May 6, 2016

Ping @dereksabryfb -- are you working on this? or else close it

@hvanhovell
Copy link
Contributor

I think this one can be closed, it has been implemented in #11846

@JoshRosen
Copy link
Contributor

Yep, it looks like this PR has been subsumed by #11846. @dereksabryfb, could you please close this pull request? Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants