Skip to content

Conversation

@bersprockets
Copy link
Contributor

What changes were proposed in this pull request?

Back-port of #23894 to branch-2.4.

WIth the following file structure:

/tmp/data
└── a=5

In the previous release:

scala> spark.read.schema("A int, ID long").parquet("/tmp/data/").printSchema
root
 |-- ID: long (nullable = true)
 |-- A: integer (nullable = true)

While in current code:

scala> spark.read.schema("A int, ID long").parquet("/tmp/data/").printSchema
root
 |-- ID: long (nullable = true)
 |-- a: integer (nullable = true)

We can see that the partition column name a is different from A as user specifed. This PR is to fix the case and make it more user-friendly.

Closes #23894 from gengliangwang/fileIndexSchema.

Authored-by: Gengliang Wang [email protected]
Signed-off-by: Wenchen Fan [email protected]

How was this patch tested?

Unit test

WIth the following file structure:
```
/tmp/data
└── a=5
```

In the previous release:
```
scala> spark.read.schema("A int, ID long").parquet("/tmp/data/").printSchema
root
 |-- ID: long (nullable = true)
 |-- A: integer (nullable = true)
```

While in current code:
```
scala> spark.read.schema("A int, ID long").parquet("/tmp/data/").printSchema
root
 |-- ID: long (nullable = true)
 |-- a: integer (nullable = true)
```

We can see that the partition column name `a` is different from `A` as user specifed. This PR is to fix the case and make it more user-friendly.

Unit test

Closes apache#23894 from gengliangwang/fileIndexSchema.

Authored-by: Gengliang Wang <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
@SparkQA
Copy link

SparkQA commented Feb 27, 2019

Test build #102830 has finished for PR 23909 at commit 06606d6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

ah sorry I thought it's a bug only in master, thanks for doing it!

@cloud-fan
Copy link
Contributor

merging to 2.4!

cloud-fan pushed a commit that referenced this pull request Feb 28, 2019
…names if possible

## What changes were proposed in this pull request?

Back-port of #23894 to branch-2.4.

WIth the following file structure:
```
/tmp/data
└── a=5
```

In the previous release:
```
scala> spark.read.schema("A int, ID long").parquet("/tmp/data/").printSchema
root
 |-- ID: long (nullable = true)
 |-- A: integer (nullable = true)
```

While in current code:
```
scala> spark.read.schema("A int, ID long").parquet("/tmp/data/").printSchema
root
 |-- ID: long (nullable = true)
 |-- a: integer (nullable = true)
```

We can see that the partition column name `a` is different from `A` as user specifed. This PR is to fix the case and make it more user-friendly.

Closes #23894 from gengliangwang/fileIndexSchema.

Authored-by: Gengliang Wang <gengliang.wangdatabricks.com>
Signed-off-by: Wenchen Fan <wenchendatabricks.com>

## How was this patch tested?

Unit test

Closes #23909 from bersprockets/backport-SPARK-26990.

Authored-by: Gengliang Wang <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
@cloud-fan cloud-fan closed this Feb 28, 2019
kai-chi pushed a commit to kai-chi/spark that referenced this pull request Jul 23, 2019
…names if possible

## What changes were proposed in this pull request?

Back-port of apache#23894 to branch-2.4.

WIth the following file structure:
```
/tmp/data
└── a=5
```

In the previous release:
```
scala> spark.read.schema("A int, ID long").parquet("/tmp/data/").printSchema
root
 |-- ID: long (nullable = true)
 |-- A: integer (nullable = true)
```

While in current code:
```
scala> spark.read.schema("A int, ID long").parquet("/tmp/data/").printSchema
root
 |-- ID: long (nullable = true)
 |-- a: integer (nullable = true)
```

We can see that the partition column name `a` is different from `A` as user specifed. This PR is to fix the case and make it more user-friendly.

Closes apache#23894 from gengliangwang/fileIndexSchema.

Authored-by: Gengliang Wang <gengliang.wangdatabricks.com>
Signed-off-by: Wenchen Fan <wenchendatabricks.com>

## How was this patch tested?

Unit test

Closes apache#23909 from bersprockets/backport-SPARK-26990.

Authored-by: Gengliang Wang <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
kai-chi pushed a commit to kai-chi/spark that referenced this pull request Jul 25, 2019
…names if possible

## What changes were proposed in this pull request?

Back-port of apache#23894 to branch-2.4.

WIth the following file structure:
```
/tmp/data
└── a=5
```

In the previous release:
```
scala> spark.read.schema("A int, ID long").parquet("/tmp/data/").printSchema
root
 |-- ID: long (nullable = true)
 |-- A: integer (nullable = true)
```

While in current code:
```
scala> spark.read.schema("A int, ID long").parquet("/tmp/data/").printSchema
root
 |-- ID: long (nullable = true)
 |-- a: integer (nullable = true)
```

We can see that the partition column name `a` is different from `A` as user specifed. This PR is to fix the case and make it more user-friendly.

Closes apache#23894 from gengliangwang/fileIndexSchema.

Authored-by: Gengliang Wang <gengliang.wangdatabricks.com>
Signed-off-by: Wenchen Fan <wenchendatabricks.com>

## How was this patch tested?

Unit test

Closes apache#23909 from bersprockets/backport-SPARK-26990.

Authored-by: Gengliang Wang <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
kai-chi pushed a commit to kai-chi/spark that referenced this pull request Aug 1, 2019
…names if possible

## What changes were proposed in this pull request?

Back-port of apache#23894 to branch-2.4.

WIth the following file structure:
```
/tmp/data
└── a=5
```

In the previous release:
```
scala> spark.read.schema("A int, ID long").parquet("/tmp/data/").printSchema
root
 |-- ID: long (nullable = true)
 |-- A: integer (nullable = true)
```

While in current code:
```
scala> spark.read.schema("A int, ID long").parquet("/tmp/data/").printSchema
root
 |-- ID: long (nullable = true)
 |-- a: integer (nullable = true)
```

We can see that the partition column name `a` is different from `A` as user specifed. This PR is to fix the case and make it more user-friendly.

Closes apache#23894 from gengliangwang/fileIndexSchema.

Authored-by: Gengliang Wang <gengliang.wangdatabricks.com>
Signed-off-by: Wenchen Fan <wenchendatabricks.com>

## How was this patch tested?

Unit test

Closes apache#23909 from bersprockets/backport-SPARK-26990.

Authored-by: Gengliang Wang <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
@bersprockets bersprockets deleted the backport-SPARK-26990 branch May 1, 2022 01:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants