-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-12063][SQL] Use number in group by clause to refer to columns #10052
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
86d7e93 to
74c42ae
Compare
|
Thanks for working on this. This seems resonable to support, but I have two suggestions:
|
|
@dereksabryfb you should also add the email you used in your git commit to your github profile so it shows up on github. |
|
Thanks for your feedback! The change is now implemented as a rule in the Analyzer and a unit test has been added. Let me know if there are any other changes I need to make. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure we want to match on the pretty string as I think this would also trigger for things like "1", instead I'd consider matching on a Literal of NumericType.
|
Are there other cases that we should handle here? Can you do the same thing in ORDER BY normally? |
|
Thanks for the feedback! I'm making the changes you suggested. With respect to the 'ORDER BY' clause, this looks to be semi-handled by the ResolveSortReferences in the Analyzer; because in standard SQL literals are allowed in the order by clause, this rule just sorts by the literal value '1'; In HiveQL, '1' is interpreted as a column in the same way it is in the group by clause. I can add the case for a Sort() with an IntegerType. I assume there is no one currently relying on how 'sort by 1' currently functions. |
|
We'll have to note the change in the release notes, but since its a no-op to sort by a constant I think we can safely change behavior here. |
|
Added a case for sort |
|
ok to test |
|
Test build #47534 has finished for PR 10052 at commit
|
|
Apologies, I haven't been able to run ./dev/run-tests, getting the following exception: http://pastebin.com/L0p0sjtJ so I wasn't able to pick up the style issues, and I'm not sure if there's more that the build doesn't flag. |
|
I'd try the following locally Each of those commands can be run separately too and you can use ~ to rerun whenever something changes to iterate more quickly |
|
Test build #47538 has finished for PR 10052 at commit
|
|
Looks like it fails on the query "SELECT a, count(2) FROM testData2 GROUP BY a, 2" because 2 now refers to the column count(2) which is not valid syntax for a group by clause in hive. How should I go about resolving this? The purpose of the test is to see that literals in the group by clause don't modify the results, but the purpose of this patch is to do the opposite of that. I could modify the offending case, but I feel the whole test may be irrelevant with this patch. |
|
We should probably throw an Regarding the test, we can probably remove it. |
|
I removed the offending test. I found that there was no Thanks again for your feedback. |
|
Test build #47566 has finished for PR 10052 at commit
|
|
Test build #47565 has finished for PR 10052 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
case Literal(index: Int) => is easier. It also eliminates the need for group.toString.toInt
|
@dereksabryfb are you still working on this? |
|
Ping @dereksabryfb -- are you working on this? or else close it |
|
I think this one can be closed, it has been implemented in #11846 |
|
Yep, it looks like this PR has been subsumed by #11846. @dereksabryfb, could you please close this pull request? Thanks! |
If there is a number n in a group by clause, the nth column in the select clause is used to modify the group by clause to refer to this column instead of the number.
eg.
select a,b from c group by 1,2
becomes
select a,b from c group by a,b