Skip to content

Conversation

@cloud-fan
Copy link
Contributor

Before #8371, there was a bug for Sort on Aggregate that we can't use aggregate expressions named _aggOrdering and can't use more than one ordering expressions which contains aggregate functions. The reason of this bug is that: The aggregate expression in SortOrder never get resolved, we alias it with _aggOrdering and call toAttribute which gives us an UnresolvedAttribute. So actually we are referencing aggregate expression by name, not by exprId like we thought. And if there is already an aggregate expression named _aggOrdering or there are more than one ordering expressions having aggregate functions, we will have conflict names and can't search by name.

However, after #8371 got merged, the SortOrders are guaranteed to be resolved and we are always referencing aggregate expression by exprId. The Bug doesn't exist anymore and this PR add regression tests for it.

@cloud-fan
Copy link
Contributor Author

cc @marmbrus

@SparkQA
Copy link

SparkQA commented Aug 16, 2015

Test build #40996 has finished for PR 8231 at commit c660a7b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would really like to avoid needing to make these names unique since that is redundant with expression IDs. Instead I think we should add a field resolvable to Attribute and set it to false for any constructed aliases. We can then skip these attributes when doing resolution.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean we should add a new subclass of Attribute that reference to other NamedExpression by exprId?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a subclass, just another field (like qualifiers) that we can use to say that a given attribute produced by an alias can't be resolved by name, but only by explicitly calling toAttribute on the alias.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @marmbrus , I created an UnresolvedReference to solve this issue.

In this case we need a "placeholder" here to reference to the still-need-resolve Alias. Previously we use UnresolvedAttribute as "placeholder" which depends on unique name(call toAttribute on an unresolved Alias will return an UnresolvedAttribute anyway), so I added a new kind of "placeholder" that depends on exprId.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how to solve it by adding a resolvable field to Attribute, maybe I missed something here?

@SparkQA
Copy link

SparkQA commented Aug 20, 2015

Test build #41306 has finished for PR 8231 at commit f2fdfac.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor Author

The issue has been fixed by #8371, so this PR only added the regression test.
cc @marmbrus

@SparkQA
Copy link

SparkQA commented Aug 25, 2015

Test build #41532 has finished for PR 8231 at commit 36c11c6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@andrewor14
Copy link
Contributor

@cloud-fan could you update the title of this PR to mention that it adds regression tests only? @marmbrus does this LGTY?

@marmbrus
Copy link
Contributor

marmbrus commented Sep 1, 2015

_aggOrdering works, but thats not terribly interesting to test. After my PR there is likely a different alias that will cause problems. Either way, an optimal solution would add a field to Attribute that prevents it from being resolved by name. We can set this to true in artificially created aliases.

@cloud-fan cloud-fan changed the title [SPARK-10034][SQL] Can't analyze Sort on Aggregate with aggregation expression named "_aggOrdering" [SPARK-10034][SQL] add regression test for Sort on Aggregate Sep 2, 2015
@cloud-fan
Copy link
Contributor Author

Hi @marmbrus @andrewor14 , I have updated the description and title of this PR, to explain what was the problem before and why it disappeared. Is this LGTY?

@marmbrus , I think we don't need to add a field to Attribute anymore as now we only reference pushed down aggregate expressions by exprId, not by name.

@SparkQA
Copy link

SparkQA commented Sep 2, 2015

Test build #41906 has finished for PR 8231 at commit 845b161.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@marmbrus
Copy link
Contributor

marmbrus commented Sep 2, 2015

I am fairly confident that given enough time, we could find another case where you can reference an artificial name and cause analysis to fail. We can merge this test though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants