[SPARK-25988] [SQL] Keep names unchanged when deduplicating the column names in Analyzer #22990

gatorsmile · 2018-11-09T05:36:21Z

What changes were proposed in this pull request?

When the queries do not use the column names with the same case, users might hit various errors. Below is a typical test failure they can hit.

Expected only partition pruning predicates: ArrayBuffer(isnotnull(tdate#237), (cast(tdate#237 as string) >= 2017-08-15));
org.apache.spark.sql.AnalysisException: Expected only partition pruning predicates: ArrayBuffer(isnotnull(tdate#237), (cast(tdate#237 as string) >= 2017-08-15));
	at org.apache.spark.sql.catalyst.catalog.ExternalCatalogUtils$.prunePartitionsByFilter(ExternalCatalogUtils.scala:146)
	at org.apache.spark.sql.catalyst.catalog.InMemoryCatalog.listPartitionsByFilter(InMemoryCatalog.scala:560)
	at org.apache.spark.sql.catalyst.catalog.SessionCatalog.listPartitionsByFilter(SessionCatalog.scala:925)

How was this patch tested?

Added two test cases.

gatorsmile · 2018-11-09T05:36:36Z

cc @cloud-fan

cloud-fan · 2018-11-09T05:54:16Z

sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala

+            |SELECT  N.tdate, EX AS new_ex
+            |FROM tab1 N
+            |JOIN tab2 Z
+            |ON      N.tdate         = Z.tdate


nit: ON N.tdate = Z.tdate

cloud-fan · 2018-11-09T05:55:43Z

sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala

    }
  }
+
+  test("self join with aliases on partitioned tables #1") {


let's put the JIRA ticket number in the test name

cloud-fan · 2018-11-09T05:55:59Z

good catch! LGTM

SparkQA · 2018-11-09T07:30:25Z

Test build #98638 has finished for PR 22990 at commit 17b725c.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-11-09T08:05:02Z

Test build #98642 has finished for PR 22990 at commit 52f2b1e.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2018-11-09T08:48:54Z

retest this please

SparkQA · 2018-11-09T12:15:43Z

Test build #98650 has finished for PR 22990 at commit 52f2b1e.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

… names in Analyzer ## What changes were proposed in this pull request? When the queries do not use the column names with the same case, users might hit various errors. Below is a typical test failure they can hit. ``` Expected only partition pruning predicates: ArrayBuffer(isnotnull(tdate#237), (cast(tdate#237 as string) >= 2017-08-15)); org.apache.spark.sql.AnalysisException: Expected only partition pruning predicates: ArrayBuffer(isnotnull(tdate#237), (cast(tdate#237 as string) >= 2017-08-15)); at org.apache.spark.sql.catalyst.catalog.ExternalCatalogUtils$.prunePartitionsByFilter(ExternalCatalogUtils.scala:146) at org.apache.spark.sql.catalyst.catalog.InMemoryCatalog.listPartitionsByFilter(InMemoryCatalog.scala:560) at org.apache.spark.sql.catalyst.catalog.SessionCatalog.listPartitionsByFilter(SessionCatalog.scala:925) ``` ## How was this patch tested? Added two test cases. Closes #22990 from gatorsmile/fix1283. Authored-by: gatorsmile <[email protected]> Signed-off-by: gatorsmile <[email protected]> (cherry picked from commit 657fd00) Signed-off-by: gatorsmile <[email protected]>

gatorsmile · 2018-11-09T16:38:31Z

Thanks! Merged to master/2.4

… names in Analyzer ## What changes were proposed in this pull request? When the queries do not use the column names with the same case, users might hit various errors. Below is a typical test failure they can hit. ``` Expected only partition pruning predicates: ArrayBuffer(isnotnull(tdate#237), (cast(tdate#237 as string) >= 2017-08-15)); org.apache.spark.sql.AnalysisException: Expected only partition pruning predicates: ArrayBuffer(isnotnull(tdate#237), (cast(tdate#237 as string) >= 2017-08-15)); at org.apache.spark.sql.catalyst.catalog.ExternalCatalogUtils$.prunePartitionsByFilter(ExternalCatalogUtils.scala:146) at org.apache.spark.sql.catalyst.catalog.InMemoryCatalog.listPartitionsByFilter(InMemoryCatalog.scala:560) at org.apache.spark.sql.catalyst.catalog.SessionCatalog.listPartitionsByFilter(SessionCatalog.scala:925) ``` ## How was this patch tested? Added two test cases. Closes apache#22990 from gatorsmile/fix1283. Authored-by: gatorsmile <[email protected]> Signed-off-by: gatorsmile <[email protected]>

… names in Analyzer ## What changes were proposed in this pull request? When the queries do not use the column names with the same case, users might hit various errors. Below is a typical test failure they can hit. ``` Expected only partition pruning predicates: ArrayBuffer(isnotnull(tdate#237), (cast(tdate#237 as string) >= 2017-08-15)); org.apache.spark.sql.AnalysisException: Expected only partition pruning predicates: ArrayBuffer(isnotnull(tdate#237), (cast(tdate#237 as string) >= 2017-08-15)); at org.apache.spark.sql.catalyst.catalog.ExternalCatalogUtils$.prunePartitionsByFilter(ExternalCatalogUtils.scala:146) at org.apache.spark.sql.catalyst.catalog.InMemoryCatalog.listPartitionsByFilter(InMemoryCatalog.scala:560) at org.apache.spark.sql.catalyst.catalog.SessionCatalog.listPartitionsByFilter(SessionCatalog.scala:925) ``` ## How was this patch tested? Added two test cases. Closes apache#22990 from gatorsmile/fix1283. Authored-by: gatorsmile <[email protected]> Signed-off-by: gatorsmile <[email protected]> (cherry picked from commit 657fd00) Signed-off-by: gatorsmile <[email protected]>

… names in Analyzer Ref: LIHADOOP-42706 When the queries do not use the column names with the same case, users might hit various errors. Below is a typical test failure they can hit. ``` Expected only partition pruning predicates: ArrayBuffer(isnotnull(tdate#237), (cast(tdate#237 as string) >= 2017-08-15)); org.apache.spark.sql.AnalysisException: Expected only partition pruning predicates: ArrayBuffer(isnotnull(tdate#237), (cast(tdate#237 as string) >= 2017-08-15)); at org.apache.spark.sql.catalyst.catalog.ExternalCatalogUtils$.prunePartitionsByFilter(ExternalCatalogUtils.scala:146) at org.apache.spark.sql.catalyst.catalog.InMemoryCatalog.listPartitionsByFilter(InMemoryCatalog.scala:560) at org.apache.spark.sql.catalyst.catalog.SessionCatalog.listPartitionsByFilter(SessionCatalog.scala:925) ``` Added two test cases. Closes apache#22990 from gatorsmile/fix1283. Authored-by: gatorsmile <[email protected]> Signed-off-by: gatorsmile <[email protected]> (cherry picked from commit 657fd00) RB=1518183 BUG=LIHADOOP-42706 G=superfriends-reviewers R=fli,mshen,yezhou,edlu A=fli

gatorsmile added 2 commits November 8, 2018 21:27

fix

5e9f6f3

fix

17b725c

cloud-fan reviewed Nov 9, 2018

View reviewed changes

style fix

52f2b1e

asfgit closed this in 657fd00 Nov 9, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-25988] [SQL] Keep names unchanged when deduplicating the column names in Analyzer #22990

[SPARK-25988] [SQL] Keep names unchanged when deduplicating the column names in Analyzer #22990

Uh oh!

gatorsmile commented Nov 9, 2018

Uh oh!

gatorsmile commented Nov 9, 2018

Uh oh!

cloud-fan Nov 9, 2018

Uh oh!

cloud-fan Nov 9, 2018

Uh oh!

cloud-fan commented Nov 9, 2018

Uh oh!

SparkQA commented Nov 9, 2018

Uh oh!

SparkQA commented Nov 9, 2018

Uh oh!

cloud-fan commented Nov 9, 2018

Uh oh!

SparkQA commented Nov 9, 2018

Uh oh!

gatorsmile commented Nov 9, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[SPARK-25988] [SQL] Keep names unchanged when deduplicating the column names in Analyzer #22990

[SPARK-25988] [SQL] Keep names unchanged when deduplicating the column names in Analyzer #22990

Uh oh!

Conversation

gatorsmile commented Nov 9, 2018

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

gatorsmile commented Nov 9, 2018

Uh oh!

cloud-fan Nov 9, 2018

Choose a reason for hiding this comment

Uh oh!

cloud-fan Nov 9, 2018

Choose a reason for hiding this comment

Uh oh!

cloud-fan commented Nov 9, 2018

Uh oh!

SparkQA commented Nov 9, 2018

Uh oh!

SparkQA commented Nov 9, 2018

Uh oh!

cloud-fan commented Nov 9, 2018

Uh oh!

SparkQA commented Nov 9, 2018

Uh oh!

gatorsmile commented Nov 9, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants