-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-10034][SQL] add regression test for Sort on Aggregate #8231
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
cc @marmbrus |
|
Test build #40996 has finished for PR 8231 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would really like to avoid needing to make these names unique since that is redundant with expression IDs. Instead I think we should add a field resolvable to Attribute and set it to false for any constructed aliases. We can then skip these attributes when doing resolution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean we should add a new subclass of Attribute that reference to other NamedExpression by exprId?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a subclass, just another field (like qualifiers) that we can use to say that a given attribute produced by an alias can't be resolved by name, but only by explicitly calling toAttribute on the alias.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @marmbrus , I created an UnresolvedReference to solve this issue.
In this case we need a "placeholder" here to reference to the still-need-resolve Alias. Previously we use UnresolvedAttribute as "placeholder" which depends on unique name(call toAttribute on an unresolved Alias will return an UnresolvedAttribute anyway), so I added a new kind of "placeholder" that depends on exprId.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how to solve it by adding a resolvable field to Attribute, maybe I missed something here?
|
Test build #41306 has finished for PR 8231 at commit
|
|
Test build #41532 has finished for PR 8231 at commit
|
|
@cloud-fan could you update the title of this PR to mention that it adds regression tests only? @marmbrus does this LGTY? |
|
|
|
Hi @marmbrus @andrewor14 , I have updated the description and title of this PR, to explain what was the problem before and why it disappeared. Is this LGTY? @marmbrus , I think we don't need to add a field to |
|
Test build #41906 has finished for PR 8231 at commit
|
|
I am fairly confident that given enough time, we could find another case where you can reference an artificial name and cause analysis to fail. We can merge this test though. |
Before #8371, there was a bug for
SortonAggregatethat we can't use aggregate expressions named_aggOrderingand can't use more than one ordering expressions which contains aggregate functions. The reason of this bug is that: The aggregate expression inSortOrdernever get resolved, we alias it with_aggOrderingand calltoAttributewhich gives us anUnresolvedAttribute. So actually we are referencing aggregate expression by name, not by exprId like we thought. And if there is already an aggregate expression named_aggOrderingor there are more than one ordering expressions having aggregate functions, we will have conflict names and can't search by name.However, after #8371 got merged, the
SortOrders are guaranteed to be resolved and we are always referencing aggregate expression by exprId. The Bug doesn't exist anymore and this PR add regression tests for it.