Skip to content

Conversation

@rxin
Copy link
Contributor

@rxin rxin commented Dec 7, 2016

What changes were proposed in this pull request?

This patch fixes the format specification in explain for file sources (Parquet and Text formats are the only two that are different from the rest):

Before:

scala> spark.read.text("test.text").explain()
== Physical Plan ==
*FileScan text [value#15] Batched: false, Format: org.apache.spark.sql.execution.datasources.text.TextFileFormat@xyz, Location: InMemoryFileIndex[file:/scratch/rxin/spark/test.text], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<value:string>

After:

scala> spark.read.text("test.text").explain()
== Physical Plan ==
*FileScan text [value#15] Batched: false, Format: Text, Location: InMemoryFileIndex[file:/scratch/rxin/spark/test.text], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<value:string>

Also closes #14680.

How was this patch tested?

Verified in spark-shell.

@rxin
Copy link
Contributor Author

rxin commented Dec 7, 2016

cc @cloud-fan

@rxin
Copy link
Contributor Author

rxin commented Dec 7, 2016

Actually if possible please merge this in branch-2.1.

@SparkQA
Copy link

SparkQA commented Dec 7, 2016

Test build #69772 has finished for PR 16187 at commit 566c800.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 7, 2016

Test build #69779 has started for PR 16187 at commit 73d7910.

override def shortName(): String = "parquet"

override def toString: String = "ParquetFormat"
override def toString: String = "Parquet"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just did a quick check. The other formats are using upper cases. Do you think we need to make them consistent? JSON, CSV and ORC

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the other formats are acronyms.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. Got it. The offical documents of these three formats are using upper cases.

@gatorsmile
Copy link
Member

LGTM pending test.

@SparkQA
Copy link

SparkQA commented Dec 7, 2016

Test build #3473 has finished for PR 16187 at commit 73d7910.

  • This patch fails Spark unit tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Dec 7, 2016

Test build #69812 has finished for PR 16187 at commit 73d7910.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Dec 8, 2016

Test build #3474 has finished for PR 16187 at commit 73d7910.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 8, 2016

Test build #69834 has finished for PR 16187 at commit 767ff2f.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Dec 8, 2016

Test build #69848 has started for PR 16187 at commit 767ff2f.

@SparkQA
Copy link

SparkQA commented Dec 8, 2016

Test build #3478 has finished for PR 16187 at commit 767ff2f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

cloud-fan commented Dec 8, 2016

We currently rely on FileFormat implementations to override toString in order to get a proper explain output. It'd be better to just depend on shortName for those.

Seems the PR description is outdated? We still depend on toString for explain output right?

@rxin
Copy link
Contributor Author

rxin commented Dec 8, 2016

Fixed- let me merge this in master/branch-2.1.

asfgit pushed a commit that referenced this pull request Dec 8, 2016
## What changes were proposed in this pull request?
This patch fixes the format specification in explain for file sources (Parquet and Text formats are the only two that are different from the rest):

Before:
```
scala> spark.read.text("test.text").explain()
== Physical Plan ==
*FileScan text [value#15] Batched: false, Format: org.apache.spark.sql.execution.datasources.text.TextFileFormatxyz, Location: InMemoryFileIndex[file:/scratch/rxin/spark/test.text], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<value:string>
```

After:
```
scala> spark.read.text("test.text").explain()
== Physical Plan ==
*FileScan text [value#15] Batched: false, Format: Text, Location: InMemoryFileIndex[file:/scratch/rxin/spark/test.text], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<value:string>
```

Also closes #14680.

## How was this patch tested?
Verified in spark-shell.

Author: Reynold Xin <[email protected]>

Closes #16187 from rxin/SPARK-18760.

(cherry picked from commit 5f894d2)
Signed-off-by: Reynold Xin <[email protected]>
@asfgit asfgit closed this in 5f894d2 Dec 8, 2016
robert3005 pushed a commit to palantir/spark that referenced this pull request Dec 15, 2016
## What changes were proposed in this pull request?
This patch fixes the format specification in explain for file sources (Parquet and Text formats are the only two that are different from the rest):

Before:
```
scala> spark.read.text("test.text").explain()
== Physical Plan ==
*FileScan text [value#15] Batched: false, Format: org.apache.spark.sql.execution.datasources.text.TextFileFormatxyz, Location: InMemoryFileIndex[file:/scratch/rxin/spark/test.text], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<value:string>
```

After:
```
scala> spark.read.text("test.text").explain()
== Physical Plan ==
*FileScan text [value#15] Batched: false, Format: Text, Location: InMemoryFileIndex[file:/scratch/rxin/spark/test.text], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<value:string>
```

Also closes apache#14680.

## How was this patch tested?
Verified in spark-shell.

Author: Reynold Xin <[email protected]>

Closes apache#16187 from rxin/SPARK-18760.
uzadude pushed a commit to uzadude/spark that referenced this pull request Jan 27, 2017
## What changes were proposed in this pull request?
This patch fixes the format specification in explain for file sources (Parquet and Text formats are the only two that are different from the rest):

Before:
```
scala> spark.read.text("test.text").explain()
== Physical Plan ==
*FileScan text [value#15] Batched: false, Format: org.apache.spark.sql.execution.datasources.text.TextFileFormatxyz, Location: InMemoryFileIndex[file:/scratch/rxin/spark/test.text], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<value:string>
```

After:
```
scala> spark.read.text("test.text").explain()
== Physical Plan ==
*FileScan text [value#15] Batched: false, Format: Text, Location: InMemoryFileIndex[file:/scratch/rxin/spark/test.text], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<value:string>
```

Also closes apache#14680.

## How was this patch tested?
Verified in spark-shell.

Author: Reynold Xin <[email protected]>

Closes apache#16187 from rxin/SPARK-18760.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants