[SPARK-26990][SQL][BACKPORT-2.4] FileIndex: use user specified field names if possible #23909

bersprockets · 2019-02-27T18:49:04Z

What changes were proposed in this pull request?

Back-port of #23894 to branch-2.4.

WIth the following file structure:

/tmp/data
└── a=5

In the previous release:

scala> spark.read.schema("A int, ID long").parquet("/tmp/data/").printSchema
root
 |-- ID: long (nullable = true)
 |-- A: integer (nullable = true)

While in current code:

scala> spark.read.schema("A int, ID long").parquet("/tmp/data/").printSchema
root
 |-- ID: long (nullable = true)
 |-- a: integer (nullable = true)

We can see that the partition column name a is different from A as user specifed. This PR is to fix the case and make it more user-friendly.

Closes #23894 from gengliangwang/fileIndexSchema.

Authored-by: Gengliang Wang [email protected]
Signed-off-by: Wenchen Fan [email protected]

How was this patch tested?

Unit test

WIth the following file structure: ``` /tmp/data └── a=5 ``` In the previous release: ``` scala> spark.read.schema("A int, ID long").parquet("/tmp/data/").printSchema root |-- ID: long (nullable = true) |-- A: integer (nullable = true) ``` While in current code: ``` scala> spark.read.schema("A int, ID long").parquet("/tmp/data/").printSchema root |-- ID: long (nullable = true) |-- a: integer (nullable = true) ``` We can see that the partition column name `a` is different from `A` as user specifed. This PR is to fix the case and make it more user-friendly. Unit test Closes apache#23894 from gengliangwang/fileIndexSchema. Authored-by: Gengliang Wang <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>

SparkQA · 2019-02-27T22:45:36Z

Test build #102830 has finished for PR 23909 at commit 06606d6.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2019-02-28T01:35:25Z

ah sorry I thought it's a bug only in master, thanks for doing it!

cloud-fan · 2019-02-28T01:37:17Z

merging to 2.4!

…names if possible ## What changes were proposed in this pull request? Back-port of #23894 to branch-2.4. WIth the following file structure: ``` /tmp/data └── a=5 ``` In the previous release: ``` scala> spark.read.schema("A int, ID long").parquet("/tmp/data/").printSchema root |-- ID: long (nullable = true) |-- A: integer (nullable = true) ``` While in current code: ``` scala> spark.read.schema("A int, ID long").parquet("/tmp/data/").printSchema root |-- ID: long (nullable = true) |-- a: integer (nullable = true) ``` We can see that the partition column name `a` is different from `A` as user specifed. This PR is to fix the case and make it more user-friendly. Closes #23894 from gengliangwang/fileIndexSchema. Authored-by: Gengliang Wang <gengliang.wangdatabricks.com> Signed-off-by: Wenchen Fan <wenchendatabricks.com> ## How was this patch tested? Unit test Closes #23909 from bersprockets/backport-SPARK-26990. Authored-by: Gengliang Wang <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>

…names if possible ## What changes were proposed in this pull request? Back-port of apache#23894 to branch-2.4. WIth the following file structure: ``` /tmp/data └── a=5 ``` In the previous release: ``` scala> spark.read.schema("A int, ID long").parquet("/tmp/data/").printSchema root |-- ID: long (nullable = true) |-- A: integer (nullable = true) ``` While in current code: ``` scala> spark.read.schema("A int, ID long").parquet("/tmp/data/").printSchema root |-- ID: long (nullable = true) |-- a: integer (nullable = true) ``` We can see that the partition column name `a` is different from `A` as user specifed. This PR is to fix the case and make it more user-friendly. Closes apache#23894 from gengliangwang/fileIndexSchema. Authored-by: Gengliang Wang <gengliang.wangdatabricks.com> Signed-off-by: Wenchen Fan <wenchendatabricks.com> ## How was this patch tested? Unit test Closes apache#23909 from bersprockets/backport-SPARK-26990. Authored-by: Gengliang Wang <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>

cloud-fan closed this Feb 28, 2019

bersprockets deleted the backport-SPARK-26990 branch May 1, 2022 01:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-26990][SQL][BACKPORT-2.4] FileIndex: use user specified field names if possible #23909

[SPARK-26990][SQL][BACKPORT-2.4] FileIndex: use user specified field names if possible #23909

Uh oh!

bersprockets commented Feb 27, 2019

Uh oh!

SparkQA commented Feb 27, 2019

Uh oh!

cloud-fan commented Feb 28, 2019

Uh oh!

cloud-fan commented Feb 28, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[SPARK-26990][SQL][BACKPORT-2.4] FileIndex: use user specified field names if possible #23909

[SPARK-26990][SQL][BACKPORT-2.4] FileIndex: use user specified field names if possible #23909

Uh oh!

Conversation

bersprockets commented Feb 27, 2019

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

SparkQA commented Feb 27, 2019

Uh oh!

cloud-fan commented Feb 28, 2019

Uh oh!

cloud-fan commented Feb 28, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants