Skip to content

Conversation

@tianyi
Copy link
Contributor

@tianyi tianyi commented Sep 26, 2014

we should add some judgements base on the datatype to handle some particular sql described in https://issues.apache.org/jira/browse/SPARK-3688

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rename this test...

@cloud-fan
Copy link
Contributor

Nice catch! I think the problem is: for a table(a string, b string), when we run "select a.b from test a join test b", we have 2 options to resolve a.b. One is "table a, column b", another is "table b, column a.b". Here we know that column a don't have a sub column b, so "table b, column a.b" is invalid, the only option is "table a, column b". However, the system didn't know this and report "ambiguousReferences".
But what if we really have a sub column b for column a? Should we report "ambiguousReferences"? @marmbrus @liancheng What do you think?

@tianyi
Copy link
Contributor Author

tianyi commented Sep 26, 2014

@cloud-fan
I think we should follow some orders to resolve a.b
First, if 'a' match a table name or alias, then we recognize a as a table.
if 'a' did not match any table name or alias, then we expect 'a' should be a structType column.

@cloud-fan
Copy link
Contributor

@tianyi Let me raise an example. For table: { a: { c: String }, b: String }, if we run select a.b from test a join test b, your PR will still give 2 options to resolve a.b: "table a, column b", "table b, column a.b" as a is StructType. Actually we can tell that a have a sub column c, not b, so "table b, column a.b" is invalid. But it's hard to tell whether the column can be resolved in LogicalPlan#resolve as column name can be very complicated like a.b[1].c.d[0]
I think a perfect solution is keep all the possible resolve path, and go one by one. If more than one path success, then throw an "ambiguousReferences" exception.

@liancheng
Copy link
Contributor

@cloud-fan Tried the following snippet with ambiguous references in Hive:

CREATE TABLE t1(a STRUCT<x: INT>, k INT);
CREATE TABLE t2(x INT);
SELECT a.x FROM t1 a JOIN t2 b ON a.k = b.x;

and Hive reports "Invalid column reference 'x'", which I think is reasonable. So it seems that Hive treats struct fields and table columns equally.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need the withName call anymore, since now we preserve case of attribute names by leveraging resolver to compare names.

@liancheng
Copy link
Contributor

@cloud-fan Sorry, made a mistake in the snippet I used, it should be:

CREATE TABLE t1(x INT);
CREATE TABLE t2(a STRUCT<x: INT>, k INT);
SELECT a.x FROM t1 a JOIN t2 b ON a.x = b.k;

And Hive can resolve a.x successfully.

@marmbrus
Copy link
Contributor

ok to test

@SparkQA
Copy link

SparkQA commented Sep 27, 2014

QA tests have started for PR 2542 at commit 252dbbb.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Sep 27, 2014

Tests timed out after a configured wait of 120m.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20921/

@tianyi
Copy link
Contributor Author

tianyi commented Sep 28, 2014

Hi, @marmbrus . The current codes still have some bugs to fix, I talked @liancheng yesterday, I will push a update later.

@SparkQA
Copy link

SparkQA commented Sep 28, 2014

QA tests have started for PR 2542 at commit a018641.

  • This patch merges cleanly.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20925/

@cloud-fan
Copy link
Contributor

@liancheng Hmm..I don't have a hive environment for test...

CREATE TABLE t1(x INT);
CREATE TABLE t2(a STRUCT<x: INT>, k INT);
SELECT a.x FROM t1 a JOIN t2 b;

Without this PR, spark sql will report ambiguousReferences, but how hive resolve a.x? "table a, column x" or "table b, column a.x"?
Or do we have to add the ON statement?

@SparkQA
Copy link

SparkQA commented Sep 28, 2014

QA tests have started for PR 2542 at commit a018641.

  • This patch merges cleanly.

@tianyi
Copy link
Contributor Author

tianyi commented Sep 28, 2014

@cloud-fan, I think it is reasonable for return "ambiguous references" in the case you mentioned, because we can't make sure whether 'a' is a table alias or column name. In my last commits, spark will return "ambiguous references" for your case.

@cloud-fan
Copy link
Contributor

@tianyi

CREATE TABLE t1(x INT);
CREATE TABLE t2(a STRUCT<x: INT>, k INT);
SELECT a.x FROM t1 a JOIN t2 b ON a.x = b.k;

But hive can resolve this as @liancheng said. What's the magic here for the ON statement?

@SparkQA
Copy link

SparkQA commented Sep 28, 2014

Tests timed out after a configured wait of 120m.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20926/

@SparkQA
Copy link

SparkQA commented Sep 28, 2014

Tests timed out after a configured wait of 120m.

@SparkQA
Copy link

SparkQA commented Sep 28, 2014

QA tests have started for PR 2542 at commit e9cd8be.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Sep 28, 2014

Tests timed out after a configured wait of 120m.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20929/

@SparkQA
Copy link

SparkQA commented Sep 28, 2014

QA tests have started for PR 2542 at commit e9cd8be.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Sep 28, 2014

Tests timed out after a configured wait of 120m.

@SparkQA
Copy link

SparkQA commented Oct 29, 2014

Test build #22417 has finished for PR 2542 at commit efc2df3.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22417/
Test FAILed.

@liancheng
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Oct 30, 2014

Test build #22517 has started for PR 2542 at commit b708fc7.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Oct 30, 2014

Test build #22519 has started for PR 2542 at commit b708fc7.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Oct 30, 2014

Test build #22517 has finished for PR 2542 at commit b708fc7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22517/
Test PASSed.

@SparkQA
Copy link

SparkQA commented Oct 30, 2014

Test build #22519 has finished for PR 2542 at commit b708fc7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22519/
Test PASSed.

@tianyi
Copy link
Contributor Author

tianyi commented Oct 30, 2014

Hi, @marmbrus @liancheng
I had rebased this PR after #2762.
Any more comments on this?

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@liancheng
Copy link
Contributor

ok to test

@SparkQA
Copy link

SparkQA commented Nov 14, 2014

Test build #23398 has started for PR 2542 at commit b708fc7.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Nov 15, 2014

Test build #23398 has finished for PR 2542 at commit b708fc7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23398/
Test PASSed.

@SparkQA
Copy link

SparkQA commented Dec 17, 2014

Test build #548 has started for PR 2542 at commit b708fc7.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Dec 17, 2014

Test build #548 has finished for PR 2542 at commit b708fc7.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 17, 2014

Test build #549 has started for PR 2542 at commit b708fc7.

  • This patch does not merge cleanly.

@SparkQA
Copy link

SparkQA commented Dec 18, 2014

Test build #549 has finished for PR 2542 at commit b708fc7.

  • This patch fails PySpark unit tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@liancheng
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Dec 22, 2014

Test build #24696 has started for PR 2542 at commit b708fc7.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Dec 22, 2014

Test build #24696 has finished for PR 2542 at commit b708fc7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24696/
Test PASSed.

@tianyi
Copy link
Contributor Author

tianyi commented Feb 11, 2015

It's hard to rebase this branch due to so much changes in the latest master branch.
So I had opened a new PR #4524 for this Issue.

@tianyi tianyi closed this Feb 11, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants