Skip to content

Conversation

@imback82
Copy link
Contributor

@imback82 imback82 commented Jan 30, 2020

What changes were proposed in this pull request?

This PR fixes the issue where queries with qualified columns like SELECT t.a FROM t would fail to resolve for v2 tables.

This PR would allow qualified column names in query as following:

SELECT testcat.ns1.ns2.tbl.foo FROM testcat.ns1.ns2.tbl
SELECT ns1.ns2.tbl.foo FROM testcat.ns1.ns2.tbl
SELECT ns2.tbl.foo FROM testcat.ns1.ns2.tbl
SELECT tbl.foo FROM testcat.ns1.ns2.tbl

Why are the changes needed?

This is a bug because you cannot qualify column names in queries.

Does this PR introduce any user-facing change?

Yes, now users can qualify column names for v2 tables.

How was this patch tested?

Added new tests.

@SparkQA
Copy link

SparkQA commented Jan 30, 2020

Test build #117534 has finished for PR 27391 at commit dd59446.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class AliasIdentifier(name: String, namespace: Seq[String])

checkAnswer(sql("select t.i from spark_catalog.default.t"), Row(1))
checkAnswer(sql("select default.t.i from spark_catalog.default.t"), Row(1))

// catalog name cannot be used for v1 tables.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should allow using catalog name even for v1 tables since we allow it for table name resolution? (It should be a simple change.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. We can do it with another PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using catalog name for v1 tables requires changes on the existing resolution rule (matchWithTwoOrLessQualifierParts) unless we fall back to new rule. This will be a 3.1 feature and I can update matchWithTwoOrLessQualifierParts now? I wanted to make sure before I get started. Thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how you are going to implement it. Maybe we can discuss in your PR and decide if it should be 3.1 only or not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, sounds good.

@SparkQA
Copy link

SparkQA commented Jan 30, 2020

Test build #117539 has finished for PR 27391 at commit e744f81.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 30, 2020

Test build #117541 has finished for PR 27391 at commit 631304a.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 30, 2020

Test build #117536 has finished for PR 27391 at commit 713c0fb.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@imback82
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Jan 30, 2020

Test build #117559 has finished for PR 27391 at commit 631304a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@imback82 imback82 changed the title [WIP][SPARK-30612][SQL] Resolve qualified column name with v2 tables [SPARK-30612][SQL] Resolve qualified column name with v2 tables Jan 30, 2020
@imback82
Copy link
Contributor Author

@cloud-fan / @brkyvz This is now ready for review. Thanks!

Copy link
Contributor

@brkyvz brkyvz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're doing the replacement in the wrong level. It's being pushed down far too much, requiring certain unnecessary changes. We actually have a great place to introduce the SubqueryAlias, and that's at ResolveTables in the Analyzer. Any UnresolvedRelation or UnresolvedV2Relation you get can be wrapped in a SubqueryAlias once resolved, and the great part is you have all the qualifiers you need to accomplish this right there.

I'm also a bit uneasy performing such drastic changes in expression resolution this late in the game. Can we leave the existing code and simply perform an additive change (we can follow up and clean things up after Spark 3.0)

@brkyvz
Copy link
Contributor

brkyvz commented Jan 31, 2020 via email

@SparkQA
Copy link

SparkQA commented Jan 31, 2020

Test build #117583 has finished for PR 27391 at commit c414d7b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@imback82
Copy link
Contributor Author

You are right and I checked that v1 commands do not work with qualifiers. Thanks!

* Performs the lookup of DataSourceV2 Tables from v2 catalog.
*/
private def lookupV2Relation(identifier: Seq[String]): Option[DataSourceV2Relation] =
private def lookupV2Relation(identifier: Seq[String]): Option[LogicalPlan] =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return type can still be DataSourceV2Relation?

@cloud-fan
Copy link
Contributor

cloud-fan commented Jan 31, 2020

Agree with @brkyvz that we mostly only care about column qualifiers in SELECT, not DDL/DML commands. I think we only need to add SubqueryAlias when resolving UnresolvedRelation to v2 relations, to be consistent with v1 tables. An exception is UPDATE/DELETE/MERGE, where the conditions may contain qualified column names.

In general, this PR extends #17185 to support n-part qualifier. The change here makes sense to me.

@imback82
Copy link
Contributor Author

imback82 commented Feb 3, 2020

retest this please

@SparkQA
Copy link

SparkQA commented Feb 4, 2020

Test build #117787 has finished for PR 27391 at commit 15c7003.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@imback82
Copy link
Contributor Author

imback82 commented Feb 4, 2020

retest this please

@cloud-fan
Copy link
Contributor

LGTM except a few minor comments. while the major one is:https://github.com/apache/spark/pull/27391/files#r374470850

Thanks for fixing it!

@SparkQA
Copy link

SparkQA commented Feb 4, 2020

Test build #117792 has finished for PR 27391 at commit 15c7003.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

@brkyvz brkyvz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I'd love to see one more test case, but that can be added in a follow up. Thanks @imback82

class AttributeResolutionSuite extends SparkFunSuite {
val resolver = caseInsensitiveResolution

test("basic attribute resolution with namespaces") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a test please where the table name and the column name is the same and make sure resolution works. Something like:

val attrs = Seq(AttributeReference("t", IntegerType)(qualifier = Seq("ns1", "ns2", "t")))
attrs.resolve(Seq("ns1", "ns2", "t"), resolver) match {
      case Some(attr) => fail()
      case _ => fail()
}

@SparkQA
Copy link

SparkQA commented Feb 5, 2020

Test build #117865 has finished for PR 27391 at commit 24daf5f.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

attrs.forall(_.qualifier.length <= 2)

/** Match attributes for the case where all qualifiers in `attrs` have 2 or less parts. */
private def matchWithTwoOrLessPartQualifiers(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: matchWithTwoOrLessQualifierParts

/**
* Match attributes for the case where at least one qualifier in `attrs` has more than 2 parts.
*/
private def matchWithThreeOrMorePartQualifiers(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@cloud-fan
Copy link
Contributor

LGTM, let's add the test and get this merged!

@SparkQA
Copy link

SparkQA commented Feb 5, 2020

Test build #117892 has finished for PR 27391 at commit c40895f.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Feb 5, 2020

Test build #117910 has finished for PR 27391 at commit c40895f.

  • This patch fails to generate documentation.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Feb 5, 2020

Test build #117927 has finished for PR 27391 at commit c40895f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master/3.0!

cloud-fan pushed a commit that referenced this pull request Feb 6, 2020
### What changes were proposed in this pull request?

This PR fixes the issue where queries with qualified columns like `SELECT t.a FROM t` would fail to resolve for v2 tables.

This PR would allow qualified column names in query as following:
```SQL
SELECT testcat.ns1.ns2.tbl.foo FROM testcat.ns1.ns2.tbl
SELECT ns1.ns2.tbl.foo FROM testcat.ns1.ns2.tbl
SELECT ns2.tbl.foo FROM testcat.ns1.ns2.tbl
SELECT tbl.foo FROM testcat.ns1.ns2.tbl
```

### Why are the changes needed?

This is a bug because you cannot qualify column names in queries.

### Does this PR introduce any user-facing change?

Yes, now users can qualify column names for v2 tables.

### How was this patch tested?

Added new tests.

Closes #27391 from imback82/qualified_col.

Authored-by: Terry Kim <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit c27a616)
Signed-off-by: Wenchen Fan <[email protected]>
@brkyvz
Copy link
Contributor

brkyvz commented Feb 6, 2020

Thanks @imback82 and @cloud-fan!

@imback82
Copy link
Contributor Author

imback82 commented Feb 6, 2020

Thanks @brkyvz and @cloud-fan for the review!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants