Skip to content

Conversation

@marmbrus
Copy link
Contributor

This is based on bug and test case proposed by @viirya. See #5203 for a excellent description of the problem.

TLDR; The problem occurs because the function groupBy(String) calls resolve, which returns an AttributeReference. However, this AttributeReference is based on an analyzed plan which is thrown away. At execution time, we once again analyze the plan. However, in the case of self-joins, each call to analyze will produce a new tree for the left side of the join, rendering the previously returned AttributeReference invalid.

As a fix, I propose we keep the analyzed plan instead of the unresolved plan inside of a DataFrame.

@SparkQA
Copy link

SparkQA commented Mar 26, 2015

Test build #29258 has finished for PR 5217 at commit dd4dec1.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 27, 2015

Test build #29261 has finished for PR 5217 at commit 1f98e2d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

asfgit pushed a commit that referenced this pull request Mar 27, 2015
This is based on bug and test case proposed by viirya.  See #5203 for a excellent description of the problem.

TLDR; The problem occurs because the function `groupBy(String)` calls `resolve`, which returns an `AttributeReference`.  However, this `AttributeReference` is based on an analyzed plan which is thrown away.  At execution time, we once again analyze the plan.  However, in the case of self-joins, each call to analyze will produce a new tree for the left side of the join, rendering the previously returned `AttributeReference` invalid.

As a fix, I propose we keep the analyzed plan instead of the unresolved plan inside of a `DataFrame`.

Author: Michael Armbrust <[email protected]>

Closes #5217 from marmbrus/preanalyzer and squashes the following commits:

1f98e2d [Michael Armbrust] revert change
dd4dec1 [Michael Armbrust] Use the analyzed plan in DataFrame
089c52e [Michael Armbrust] WIP

(cherry picked from commit 5d9c37c)
Signed-off-by: Michael Armbrust <[email protected]>
@asfgit asfgit closed this in 5d9c37c Mar 27, 2015
@marmbrus marmbrus deleted the preanalyzer branch August 3, 2015 22:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants