-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-30612][SQL] Resolve qualified column name with v2 tables #27391
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/package.scala
Show resolved
Hide resolved
|
Test build #117534 has finished for PR 27391 at commit
|
| checkAnswer(sql("select t.i from spark_catalog.default.t"), Row(1)) | ||
| checkAnswer(sql("select default.t.i from spark_catalog.default.t"), Row(1)) | ||
|
|
||
| // catalog name cannot be used for v1 tables. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should allow using catalog name even for v1 tables since we allow it for table name resolution? (It should be a simple change.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. We can do it with another PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using catalog name for v1 tables requires changes on the existing resolution rule (matchWithTwoOrLessQualifierParts) unless we fall back to new rule. This will be a 3.1 feature and I can update matchWithTwoOrLessQualifierParts now? I wanted to make sure before I get started. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how you are going to implement it. Maybe we can discuss in your PR and decide if it should be 3.1 only or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, sounds good.
|
Test build #117539 has finished for PR 27391 at commit
|
|
Test build #117541 has finished for PR 27391 at commit
|
|
Test build #117536 has finished for PR 27391 at commit
|
|
retest this please |
|
Test build #117559 has finished for PR 27391 at commit
|
|
@cloud-fan / @brkyvz This is now ready for review. Thanks! |
brkyvz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you're doing the replacement in the wrong level. It's being pushed down far too much, requiring certain unnecessary changes. We actually have a great place to introduce the SubqueryAlias, and that's at ResolveTables in the Analyzer. Any UnresolvedRelation or UnresolvedV2Relation you get can be wrapped in a SubqueryAlias once resolved, and the great part is you have all the qualifiers you need to accomplish this right there.
I'm also a bit uneasy performing such drastic changes in expression resolution this late in the game. Can we leave the existing code and simply perform an additive change (we can follow up and clean things up after Spark 3.0)
sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogV2Util.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
Show resolved
Hide resolved
...alyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2Commands.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/identifiers.scala
Outdated
Show resolved
Hide resolved
|
I don't think so for ALTER TABLE. You don't have any ambiguity there with
respect to columns, therefore wouldn't need to add qualifiers to the
columns. I don't think v1 commands work with qualifiers either but we
should check.
…On Thu, Jan 30, 2020, 2:15 PM Terry Kim ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2Commands.scala
<#27391 (comment)>:
> @@ -389,7 +389,7 @@ case class DropTable(
case class AlterTable(
catalog: TableCatalog,
ident: Identifier,
- table: NamedRelation,
+ table: LogicalPlan,
Don't we need to wrap this with SubqueryAlias (which is not a
NamedRelation)?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#27391?email_source=notifications&email_token=ABIAE6Z7TODNEZUJ2PH5QHDRANGO3A5CNFSM4KNPCTF2YY3PNVWWK3TUL52HS4DFWFIHK3DMKJSXC5LFON2FEZLWNFSXPKTDN5WW2ZLOORPWSZGOCTXJLFI#discussion_r373222911>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABIAE6YXO43QX23P5SHNKNTRANGO3ANCNFSM4KNPCTFQ>
.
|
|
Test build #117583 has finished for PR 27391 at commit
|
|
You are right and I checked that v1 commands do not work with qualifiers. Thanks! |
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
Outdated
Show resolved
Hide resolved
| * Performs the lookup of DataSourceV2 Tables from v2 catalog. | ||
| */ | ||
| private def lookupV2Relation(identifier: Seq[String]): Option[DataSourceV2Relation] = | ||
| private def lookupV2Relation(identifier: Seq[String]): Option[LogicalPlan] = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The return type can still be DataSourceV2Relation?
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/package.scala
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/package.scala
Outdated
Show resolved
Hide resolved
...lyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/AttributeResolutionSuite.scala
Outdated
Show resolved
Hide resolved
...lyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/AttributeResolutionSuite.scala
Outdated
Show resolved
Hide resolved
|
Agree with @brkyvz that we mostly only care about column qualifiers in SELECT, not DDL/DML commands. I think we only need to add In general, this PR extends #17185 to support n-part qualifier. The change here makes sense to me. |
|
retest this please |
|
Test build #117787 has finished for PR 27391 at commit
|
|
retest this please |
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/package.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/identifiers.scala
Outdated
Show resolved
Hide resolved
|
LGTM except a few minor comments. while the major one is:https://github.com/apache/spark/pull/27391/files#r374470850 Thanks for fixing it! |
|
Test build #117792 has finished for PR 27391 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I'd love to see one more test case, but that can be added in a follow up. Thanks @imback82
| class AttributeResolutionSuite extends SparkFunSuite { | ||
| val resolver = caseInsensitiveResolution | ||
|
|
||
| test("basic attribute resolution with namespaces") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a test please where the table name and the column name is the same and make sure resolution works. Something like:
val attrs = Seq(AttributeReference("t", IntegerType)(qualifier = Seq("ns1", "ns2", "t")))
attrs.resolve(Seq("ns1", "ns2", "t"), resolver) match {
case Some(attr) => fail()
case _ => fail()
}|
Test build #117865 has finished for PR 27391 at commit
|
| attrs.forall(_.qualifier.length <= 2) | ||
|
|
||
| /** Match attributes for the case where all qualifiers in `attrs` have 2 or less parts. */ | ||
| private def matchWithTwoOrLessPartQualifiers( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: matchWithTwoOrLessQualifierParts
| /** | ||
| * Match attributes for the case where at least one qualifier in `attrs` has more than 2 parts. | ||
| */ | ||
| private def matchWithThreeOrMorePartQualifiers( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
|
LGTM, let's add the test and get this merged! |
|
Test build #117892 has finished for PR 27391 at commit
|
|
retest this please |
|
Test build #117910 has finished for PR 27391 at commit
|
|
retest this please |
|
Test build #117927 has finished for PR 27391 at commit
|
|
thanks, merging to master/3.0! |
### What changes were proposed in this pull request? This PR fixes the issue where queries with qualified columns like `SELECT t.a FROM t` would fail to resolve for v2 tables. This PR would allow qualified column names in query as following: ```SQL SELECT testcat.ns1.ns2.tbl.foo FROM testcat.ns1.ns2.tbl SELECT ns1.ns2.tbl.foo FROM testcat.ns1.ns2.tbl SELECT ns2.tbl.foo FROM testcat.ns1.ns2.tbl SELECT tbl.foo FROM testcat.ns1.ns2.tbl ``` ### Why are the changes needed? This is a bug because you cannot qualify column names in queries. ### Does this PR introduce any user-facing change? Yes, now users can qualify column names for v2 tables. ### How was this patch tested? Added new tests. Closes #27391 from imback82/qualified_col. Authored-by: Terry Kim <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit c27a616) Signed-off-by: Wenchen Fan <[email protected]>
|
Thanks @imback82 and @cloud-fan! |
|
Thanks @brkyvz and @cloud-fan for the review! |
What changes were proposed in this pull request?
This PR fixes the issue where queries with qualified columns like
SELECT t.a FROM twould fail to resolve for v2 tables.This PR would allow qualified column names in query as following:
Why are the changes needed?
This is a bug because you cannot qualify column names in queries.
Does this PR introduce any user-facing change?
Yes, now users can qualify column names for v2 tables.
How was this patch tested?
Added new tests.