Skip to content

Conversation

@rxin
Copy link
Contributor

@rxin rxin commented Jun 4, 2015

This patch replaces Distinct with Aggregate in the optimizer, so Distinct will become
more efficient over time as we optimize Aggregate (via Tungsten).

…gate.

Distinct is very similar to Aggregate, which is an important operator to optimize for.
This patch replaces Distinct with Aggregate in the optimizer, so Distinct will become
more efficient over time as we optimize Aggregate.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: Seems that Once is enough. Also applies to the "Remove SubQueries" batch above.

@liancheng
Copy link
Contributor

LGTM except for a minor issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An example in the comment can be useful for understanding:

SELECT DISTINCT f1, f2 FROM t  ==>  SELECT f1, f2 FROM t GROUP BY f1, f2

@SparkQA
Copy link

SparkQA commented Jun 4, 2015

Test build #34170 has finished for PR 6637 at commit 87e4741.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor Author

rxin commented Jun 4, 2015

I updated the comment but left the Once/Fixed in place. If we want to change that, we can do it in the future. Since Michael wrote the original code, I'm not sure if there are things that'd require running this to fixed point.

@SparkQA
Copy link

SparkQA commented Jun 4, 2015

Test build #34198 has finished for PR 6637 at commit 93d6117.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 4, 2015

Test build #34200 has finished for PR 6637 at commit b3cc50e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@asfgit asfgit closed this in 2bcdf8c Jun 4, 2015
jeanlyn pushed a commit to jeanlyn/spark that referenced this pull request Jun 12, 2015
…gate

This patch replaces Distinct with Aggregate in the optimizer, so Distinct will become
more efficient over time as we optimize Aggregate (via Tungsten).

Author: Reynold Xin <[email protected]>

Closes apache#6637 from rxin/replace-distinct and squashes the following commits:

b3cc50e [Reynold Xin] Mima excludes.
93d6117 [Reynold Xin] Code review feedback.
87e4741 [Reynold Xin] [SPARK-7440][SQL] Remove physical Distinct operator in favor of Aggregate.
nemccarthy pushed a commit to nemccarthy/spark that referenced this pull request Jun 19, 2015
…gate

This patch replaces Distinct with Aggregate in the optimizer, so Distinct will become
more efficient over time as we optimize Aggregate (via Tungsten).

Author: Reynold Xin <[email protected]>

Closes apache#6637 from rxin/replace-distinct and squashes the following commits:

b3cc50e [Reynold Xin] Mima excludes.
93d6117 [Reynold Xin] Code review feedback.
87e4741 [Reynold Xin] [SPARK-7440][SQL] Remove physical Distinct operator in favor of Aggregate.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants