-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-18760][SQL] Consistent format specification for FileFormats #16187
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
cc @cloud-fan |
|
Actually if possible please merge this in branch-2.1. |
|
Test build #69772 has finished for PR 16187 at commit
|
|
Test build #69779 has started for PR 16187 at commit |
| override def shortName(): String = "parquet" | ||
|
|
||
| override def toString: String = "ParquetFormat" | ||
| override def toString: String = "Parquet" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just did a quick check. The other formats are using upper cases. Do you think we need to make them consistent? JSON, CSV and ORC
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the other formats are acronyms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. Got it. The offical documents of these three formats are using upper cases.
|
LGTM pending test. |
|
Test build #3473 has finished for PR 16187 at commit
|
|
retest this please |
|
Test build #69812 has finished for PR 16187 at commit
|
|
retest this please |
|
Test build #3474 has finished for PR 16187 at commit
|
|
Test build #69834 has finished for PR 16187 at commit
|
|
retest this please |
|
Test build #69848 has started for PR 16187 at commit |
|
Test build #3478 has finished for PR 16187 at commit
|
Seems the PR description is outdated? We still depend on |
|
Fixed- let me merge this in master/branch-2.1. |
## What changes were proposed in this pull request?
This patch fixes the format specification in explain for file sources (Parquet and Text formats are the only two that are different from the rest):
Before:
```
scala> spark.read.text("test.text").explain()
== Physical Plan ==
*FileScan text [value#15] Batched: false, Format: org.apache.spark.sql.execution.datasources.text.TextFileFormatxyz, Location: InMemoryFileIndex[file:/scratch/rxin/spark/test.text], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<value:string>
```
After:
```
scala> spark.read.text("test.text").explain()
== Physical Plan ==
*FileScan text [value#15] Batched: false, Format: Text, Location: InMemoryFileIndex[file:/scratch/rxin/spark/test.text], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<value:string>
```
Also closes #14680.
## How was this patch tested?
Verified in spark-shell.
Author: Reynold Xin <[email protected]>
Closes #16187 from rxin/SPARK-18760.
(cherry picked from commit 5f894d2)
Signed-off-by: Reynold Xin <[email protected]>
## What changes were proposed in this pull request?
This patch fixes the format specification in explain for file sources (Parquet and Text formats are the only two that are different from the rest):
Before:
```
scala> spark.read.text("test.text").explain()
== Physical Plan ==
*FileScan text [value#15] Batched: false, Format: org.apache.spark.sql.execution.datasources.text.TextFileFormatxyz, Location: InMemoryFileIndex[file:/scratch/rxin/spark/test.text], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<value:string>
```
After:
```
scala> spark.read.text("test.text").explain()
== Physical Plan ==
*FileScan text [value#15] Batched: false, Format: Text, Location: InMemoryFileIndex[file:/scratch/rxin/spark/test.text], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<value:string>
```
Also closes apache#14680.
## How was this patch tested?
Verified in spark-shell.
Author: Reynold Xin <[email protected]>
Closes apache#16187 from rxin/SPARK-18760.
## What changes were proposed in this pull request?
This patch fixes the format specification in explain for file sources (Parquet and Text formats are the only two that are different from the rest):
Before:
```
scala> spark.read.text("test.text").explain()
== Physical Plan ==
*FileScan text [value#15] Batched: false, Format: org.apache.spark.sql.execution.datasources.text.TextFileFormatxyz, Location: InMemoryFileIndex[file:/scratch/rxin/spark/test.text], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<value:string>
```
After:
```
scala> spark.read.text("test.text").explain()
== Physical Plan ==
*FileScan text [value#15] Batched: false, Format: Text, Location: InMemoryFileIndex[file:/scratch/rxin/spark/test.text], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<value:string>
```
Also closes apache#14680.
## How was this patch tested?
Verified in spark-shell.
Author: Reynold Xin <[email protected]>
Closes apache#16187 from rxin/SPARK-18760.
What changes were proposed in this pull request?
This patch fixes the format specification in explain for file sources (Parquet and Text formats are the only two that are different from the rest):
Before:
After:
Also closes #14680.
How was this patch tested?
Verified in spark-shell.