[SPARK-18760][SQL] Consistent format specification for FileFormats #16187

rxin · 2016-12-07T05:23:42Z

What changes were proposed in this pull request?

This patch fixes the format specification in explain for file sources (Parquet and Text formats are the only two that are different from the rest):

Before:

scala> spark.read.text("test.text").explain()
== Physical Plan ==
*FileScan text [value#15] Batched: false, Format: org.apache.spark.sql.execution.datasources.text.TextFileFormat@xyz, Location: InMemoryFileIndex[file:/scratch/rxin/spark/test.text], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<value:string>

After:

scala> spark.read.text("test.text").explain()
== Physical Plan ==
*FileScan text [value#15] Batched: false, Format: Text, Location: InMemoryFileIndex[file:/scratch/rxin/spark/test.text], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<value:string>

Also closes #14680.

How was this patch tested?

Verified in spark-shell.

rxin · 2016-12-07T05:24:16Z

cc @cloud-fan

rxin · 2016-12-07T06:08:09Z

Actually if possible please merge this in branch-2.1.

SparkQA · 2016-12-07T06:54:30Z

Test build #69772 has finished for PR 16187 at commit 566c800.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-12-07T07:12:38Z

Test build #69779 has started for PR 16187 at commit 73d7910.

gatorsmile · 2016-12-07T08:35:50Z

...re/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala

  override def shortName(): String = "parquet"

-  override def toString: String = "ParquetFormat"
+  override def toString: String = "Parquet"


Just did a quick check. The other formats are using upper cases. Do you think we need to make them consistent? JSON, CSV and ORC

All the other formats are acronyms.

Yeah. Got it. The offical documents of these three formats are using upper cases.

gatorsmile · 2016-12-07T08:38:48Z

LGTM pending test.

SparkQA · 2016-12-07T10:36:44Z

Test build #3473 has finished for PR 16187 at commit 73d7910.

This patch fails Spark unit tests.
This patch does not merge cleanly.
This patch adds no public classes.

gatorsmile · 2016-12-07T19:58:54Z

retest this please

SparkQA · 2016-12-07T21:39:00Z

Test build #69812 has finished for PR 16187 at commit 73d7910.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2016-12-07T21:57:51Z

retest this please

SparkQA · 2016-12-08T00:17:20Z

Test build #3474 has finished for PR 16187 at commit 73d7910.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-12-08T02:42:19Z

Test build #69834 has finished for PR 16187 at commit 767ff2f.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2016-12-08T06:29:47Z

retest this please

SparkQA · 2016-12-08T06:32:38Z

Test build #69848 has started for PR 16187 at commit 767ff2f.

SparkQA · 2016-12-08T10:34:29Z

Test build #3478 has finished for PR 16187 at commit 767ff2f.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2016-12-08T15:08:41Z

We currently rely on FileFormat implementations to override toString in order to get a proper explain output. It'd be better to just depend on shortName for those.

Seems the PR description is outdated? We still depend on toString for explain output right?

rxin · 2016-12-08T20:51:40Z

Fixed- let me merge this in master/branch-2.1.

## What changes were proposed in this pull request? This patch fixes the format specification in explain for file sources (Parquet and Text formats are the only two that are different from the rest): Before: ``` scala> spark.read.text("test.text").explain() == Physical Plan == *FileScan text [value#15] Batched: false, Format: org.apache.spark.sql.execution.datasources.text.TextFileFormatxyz, Location: InMemoryFileIndex[file:/scratch/rxin/spark/test.text], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<value:string> ``` After: ``` scala> spark.read.text("test.text").explain() == Physical Plan == *FileScan text [value#15] Batched: false, Format: Text, Location: InMemoryFileIndex[file:/scratch/rxin/spark/test.text], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<value:string> ``` Also closes #14680. ## How was this patch tested? Verified in spark-shell. Author: Reynold Xin <[email protected]> Closes #16187 from rxin/SPARK-18760. (cherry picked from commit 5f894d2) Signed-off-by: Reynold Xin <[email protected]>

## What changes were proposed in this pull request? This patch fixes the format specification in explain for file sources (Parquet and Text formats are the only two that are different from the rest): Before: ``` scala> spark.read.text("test.text").explain() == Physical Plan == *FileScan text [value#15] Batched: false, Format: org.apache.spark.sql.execution.datasources.text.TextFileFormatxyz, Location: InMemoryFileIndex[file:/scratch/rxin/spark/test.text], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<value:string> ``` After: ``` scala> spark.read.text("test.text").explain() == Physical Plan == *FileScan text [value#15] Batched: false, Format: Text, Location: InMemoryFileIndex[file:/scratch/rxin/spark/test.text], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<value:string> ``` Also closes apache#14680. ## How was this patch tested? Verified in spark-shell. Author: Reynold Xin <[email protected]> Closes apache#16187 from rxin/SPARK-18760.

Use a more conservative fix

73d7910

rxin force-pushed the SPARK-18760 branch from 566c800 to 73d7910 Compare December 7, 2016 07:07

gatorsmile reviewed Dec 7, 2016

View reviewed changes

Fix test

767ff2f

asfgit closed this in 5f894d2 Dec 8, 2016

[SPARK-18760][SQL] Consistent format specification for FileFormats #16187

[SPARK-18760][SQL] Consistent format specification for FileFormats #16187

Uh oh!

Conversation

rxin commented Dec 7, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

rxin commented Dec 7, 2016

Uh oh!

rxin commented Dec 7, 2016

Uh oh!

SparkQA commented Dec 7, 2016

Uh oh!

SparkQA commented Dec 7, 2016

Uh oh!

gatorsmile Dec 7, 2016

Choose a reason for hiding this comment

Uh oh!

rxin Dec 7, 2016

Choose a reason for hiding this comment

Uh oh!

gatorsmile Dec 7, 2016

Choose a reason for hiding this comment

Uh oh!

gatorsmile commented Dec 7, 2016

Uh oh!

SparkQA commented Dec 7, 2016

Uh oh!

gatorsmile commented Dec 7, 2016

Uh oh!

SparkQA commented Dec 7, 2016

Uh oh!

gatorsmile commented Dec 7, 2016

Uh oh!

SparkQA commented Dec 8, 2016

Uh oh!

SparkQA commented Dec 8, 2016

Uh oh!

gatorsmile commented Dec 8, 2016

Uh oh!

SparkQA commented Dec 8, 2016

Uh oh!

SparkQA commented Dec 8, 2016

Uh oh!

cloud-fan commented Dec 8, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rxin commented Dec 8, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

rxin commented Dec 7, 2016 •

edited

Loading

cloud-fan commented Dec 8, 2016 •

edited

Loading