Skip to content

Conversation

@yjshen
Copy link
Member

@yjshen yjshen commented Sep 14, 2015

Intersect and Except are both set operators and they use the all the columns to compare equality between rows. When pushing their Project parent down, the relations they based on would change, therefore not an equivalent transformation.

JIRA: https://issues.apache.org/jira/browse/SPARK-10539

@SparkQA
Copy link

SparkQA commented Sep 14, 2015

Test build #42419 has finished for PR 8742 at commit 040b60a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yjshen
Copy link
Member Author

yjshen commented Sep 14, 2015

I need to investigate more about set operator to make sure I'm doing the right thing. Close it for now.

@yjshen yjshen closed this Sep 14, 2015
@yjshen yjshen reopened this Sep 14, 2015
@yjshen yjshen changed the title [SPARK-10539][SQL]Fix set optimization by eliminate empty project list push down [SPARK-10539][SQL]Project should not be pushed down through Intersect or Except Sep 14, 2015
@SparkQA
Copy link

SparkQA commented Sep 14, 2015

Test build #42432 has finished for PR 8742 at commit ce6ed80.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor

rxin commented Sep 14, 2015

cc @yhuai for review.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add comments in this class to explain why we cannot pushdown projections? For filter pushdown, if the condition has non-deterministic expressions, it is not safe to pushdown filters for some cases. But, it will not be the case because of #7446. But, it is still good to think about if there is any case that filter pushdown is not safe. If we determine it is safe to do filter pushdown, let's add comments to explain the reason.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yhuai, thanks for your comment. I didn't consider non-deterministic filters' effect on push down when I was doing this, I will think about it and make comments soon.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add comments at here the reason that we cannot pushdown projections and why we can pushdown filters?

@marmbrus
Copy link
Contributor

ping :)

@yhuai
Copy link
Contributor

yhuai commented Sep 18, 2015

@yjshen The fix is good. Can you address comments?

@yhuai
Copy link
Contributor

yhuai commented Sep 18, 2015

@yjshen I added the comments and create a new PR (#8823). Can you close this one?

asfgit pushed a commit that referenced this pull request Sep 18, 2015
…ct or Except #8742

Intersect and Except are both set operators and they use the all the columns to compare equality between rows. When pushing their Project parent down, the relations they based on would change, therefore not an equivalent transformation.

JIRA: https://issues.apache.org/jira/browse/SPARK-10539

I added some comments based on the fix of #8742.

Author: Yijie Shen <[email protected]>
Author: Yin Huai <[email protected]>

Closes #8823 from yhuai/fix_set_optimization.
asfgit pushed a commit that referenced this pull request Sep 18, 2015
…ct or Except #8742

Intersect and Except are both set operators and they use the all the columns to compare equality between rows. When pushing their Project parent down, the relations they based on would change, therefore not an equivalent transformation.

JIRA: https://issues.apache.org/jira/browse/SPARK-10539

I added some comments based on the fix of #8742.

Author: Yijie Shen <[email protected]>
Author: Yin Huai <[email protected]>

Closes #8823 from yhuai/fix_set_optimization.

(cherry picked from commit c6f8135)
Signed-off-by: Yin Huai <[email protected]>

Conflicts:
	sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala
@yjshen
Copy link
Member Author

yjshen commented Sep 21, 2015

Thanks @yhuai, I'll close this one.

@yjshen yjshen closed this Sep 21, 2015
kiszk pushed a commit to kiszk/spark-gpu that referenced this pull request Dec 26, 2015
…ct or Except #8742

Intersect and Except are both set operators and they use the all the columns to compare equality between rows. When pushing their Project parent down, the relations they based on would change, therefore not an equivalent transformation.

JIRA: https://issues.apache.org/jira/browse/SPARK-10539

I added some comments based on the fix of apache/spark#8742.

Author: Yijie Shen <[email protected]>
Author: Yin Huai <[email protected]>

Closes #8823 from yhuai/fix_set_optimization.
ashangit pushed a commit to ashangit/spark that referenced this pull request Oct 19, 2016
…ct or Except apache#8742

Intersect and Except are both set operators and they use the all the columns to compare equality between rows. When pushing their Project parent down, the relations they based on would change, therefore not an equivalent transformation.

JIRA: https://issues.apache.org/jira/browse/SPARK-10539

I added some comments based on the fix of apache#8742.

Author: Yijie Shen <[email protected]>
Author: Yin Huai <[email protected]>

Closes apache#8823 from yhuai/fix_set_optimization.

(cherry picked from commit c6f8135)
Signed-off-by: Yin Huai <[email protected]>

Conflicts:
	sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala

(cherry picked from commit 3df52cc)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants