Skip to content

Conversation

@cloud-fan
Copy link
Contributor

When resolve Sort with Project's output, we should use the base attribute but not the GetField chain in Project's output.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@chenghao-intel
Copy link
Contributor

@cloud-fan
Copy link
Contributor Author

Hi @chenghao-intel , your test is failed on my code, and I study into it, here is my thoughts.
First, I think the LogicalPlan#resolveChildren logic is not right for Sort.
For example, select a.b from t order by b.a on data {"a": {"b": 1}, "b": {"a": 1}}, currently we will try to resolve "b.a" on top of "a.b" instead of colunm "b", which is wrong.
Second, in ResolveSortReferences, we may expand the projection list if the field used by order by is not present. However, for something like "a.b.c", what should we add to projection? Just the bottom relation "a" or the whole expression "a.b.c"?
@marmbrus What do you think?
I'll dig into it tomorrow.

@chenghao-intel
Copy link
Contributor

Why not review #4892 for me? :)

@marmbrus
Copy link
Contributor

marmbrus commented Mar 5, 2015

test this please

@SparkQA
Copy link

SparkQA commented Mar 5, 2015

Test build #28305 has started for PR 4904 at commit 3eedbfc.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Mar 5, 2015

Test build #28305 has finished for PR 4904 at commit 3eedbfc.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28305/
Test FAILed.

@marmbrus
Copy link
Contributor

marmbrus commented Mar 5, 2015

Since I'd really like to include this in the next RC, I've opened #4918 with the style error fixed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather use checkAnswer for these tests, especially if we are going to put them in SQLQuerySuite. When check answer fails it'll give nice exceptions and then we test end to end.

asfgit pushed a commit that referenced this pull request Mar 5, 2015
Based on #4904 with style errors fixed.

`LogicalPlan#resolve` will not only produce `Attribute`, but also "`GetField` chain".
So in `ResolveSortReferences`, after resolve the ordering expressions, we should not just collect the `Attribute` results, but also `Attribute` at the bottom of "`GetField` chain".

Author: Wenchen Fan <[email protected]>
Author: Michael Armbrust <[email protected]>

Closes #4918 from marmbrus/pr/4904 and squashes the following commits:

997f84e [Michael Armbrust] fix style
3eedbfc [Wenchen Fan] fix 6145

(cherry picked from commit 5873c71)
Signed-off-by: Michael Armbrust <[email protected]>
asfgit pushed a commit that referenced this pull request Mar 5, 2015
Based on #4904 with style errors fixed.

`LogicalPlan#resolve` will not only produce `Attribute`, but also "`GetField` chain".
So in `ResolveSortReferences`, after resolve the ordering expressions, we should not just collect the `Attribute` results, but also `Attribute` at the bottom of "`GetField` chain".

Author: Wenchen Fan <[email protected]>
Author: Michael Armbrust <[email protected]>

Closes #4918 from marmbrus/pr/4904 and squashes the following commits:

997f84e [Michael Armbrust] fix style
3eedbfc [Wenchen Fan] fix 6145
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just filter out everything except AttributeReference? I don't know all corner cases of ORDER BY and feel this way is safer.

@cloud-fan
Copy link
Contributor Author

retest it please.

@cloud-fan
Copy link
Contributor Author

Seems Jenkins doesn't listen to me :(

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels kind of hacky to me as its mixing the particulars of analysis for SQL into the logical plan. Could we instead just make resolve not do partial resolution, where we can resolve the base attribute but not the GetFields that are on top. I think this change is the root cause of the regression.

@marmbrus
Copy link
Contributor

marmbrus commented Mar 6, 2015

ok to test

@SparkQA
Copy link

SparkQA commented Mar 6, 2015

Test build #28354 has started for PR 4904 at commit e383154.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Mar 6, 2015

Test build #28354 has finished for PR 4904 at commit e383154.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class BinaryClassificationMetrics(JavaModelWrapper):

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28354/
Test FAILed.

@cloud-fan
Copy link
Contributor Author

Hi @marmbrus , it feels hard for me to resolve the base attribute but not the GetFields that are on top. When we get into LogicalPlan#resolve, the Attributes are all AttributeReference, and we don't know whether it's a base attribute or GetField chain. To make it less hacky, I moved the particular code to Analyzer, is that OK?

@SparkQA
Copy link

SparkQA commented Mar 8, 2015

Test build #28372 has started for PR 4904 at commit eab0b43.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Mar 8, 2015

Test build #28372 has finished for PR 4904 at commit eab0b43.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28372/
Test PASSed.

@SparkQA
Copy link

SparkQA commented Mar 16, 2015

Test build #28634 has started for PR 4904 at commit 7b56fb9.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Mar 16, 2015

Test build #28634 has finished for PR 4904 at commit 7b56fb9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28634/
Test PASSed.

@cloud-fan
Copy link
Contributor Author

Hi @marmbrus , any more comments?

@cloud-fan
Copy link
Contributor Author

ping @marmbrus

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it will not work when we have a order by in a subquery and the outer query block try to access the those filtered out fields.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. You are creating a fake Sort node in order to resolve the attribute. If the attribute is resolved, use the resolved one to replace the unresolved one.

@marmbrus
Copy link
Contributor

Thanks for working on this! This is a pretty tricky case that involves several rules interacting, and your test cases have been incredibly helpful.

Here are my thoughts on the proposed solution:

  • My biggest problem is that we are spreading the logic about Sort's special resolution across several rules. I think it can be confusing to add things in one rule and then filter them out in another. Instead, I propose that we just finalize the resolution in the ResolveSortReferences rules, instead of adding projects there while deferring the actual resolution to another rule. At the point where we construct the projects, we know the resolution, so I think we can just use it there. This way even if we introduce ambiguity, it does not matter.
  • Separately, I think it is a problem that with our current logic we break UnresolvedReferences into UnresolvedGetFields, before know if it is possible for them to be successfully resolved. The problem here is that . is overloaded both to do scoping and to do extraction of fields. I think UnresolvedGetField is a good abstraction to have, but should only be used by the parser to handle cases where it must be field extraction (i.e. because the chain has been broken by a GetItem or other expression).

I've taken your changes and made an alternative solution (#5189). Would appreciate your feedback there.

asfgit pushed a commit that referenced this pull request Mar 31, 2015
This PR is based on work by cloud-fan in #4904, but with two differences:
 - We isolate the logic for Sort's special handling into `ResolveSortReferences`
 - We avoid creating UnresolvedGetField expressions during resolution.  Instead we either resolve GetField or we return None.  This avoids us going down the wrong path early on.

Author: Michael Armbrust <[email protected]>

Closes #5189 from marmbrus/nestedOrderBy and squashes the following commits:

b8cae45 [Michael Armbrust] fix another test
0f36a11 [Michael Armbrust] WIP
91820cd [Michael Armbrust] Fix bug.
asfgit pushed a commit that referenced this pull request Mar 31, 2015
This PR is based on work by cloud-fan in #4904, but with two differences:
 - We isolate the logic for Sort's special handling into `ResolveSortReferences`
 - We avoid creating UnresolvedGetField expressions during resolution.  Instead we either resolve GetField or we return None.  This avoids us going down the wrong path early on.

Author: Michael Armbrust <[email protected]>

Closes #5189 from marmbrus/nestedOrderBy and squashes the following commits:

b8cae45 [Michael Armbrust] fix another test
0f36a11 [Michael Armbrust] WIP
91820cd [Michael Armbrust] Fix bug.

(cherry picked from commit cd48ca5)
Signed-off-by: Michael Armbrust <[email protected]>

Conflicts:
	sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala
@cloud-fan cloud-fan closed this Apr 8, 2015
@cloud-fan cloud-fan deleted the 6145 branch April 8, 2015 07:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants