Skip to content

Conversation

@HyukjinKwon
Copy link
Member

What changes were proposed in this pull request?

This PR proposes to defer throwing the exception within DataSource.

Currently, if other datasources fail to infer the schema, it returns None and then this is being validated in DataSource as below:

scala> spark.read.json("emptydir")
org.apache.spark.sql.AnalysisException: Unable to infer schema for JSON. It must be specified manually.;
scala> spark.read.orc("emptydir")
org.apache.spark.sql.AnalysisException: Unable to infer schema for ORC. It must be specified manually.;
scala> spark.read.parquet("emptydir")
org.apache.spark.sql.AnalysisException: Unable to infer schema for Parquet. It must be specified manually.;

However, CSV it checks it within the datasource implementation and throws another exception message as below:

scala> spark.read.csv("emptydir")
java.lang.IllegalArgumentException: requirement failed: Cannot infer schema from an empty set of files

We could remove this duplicated check and validate this in one place in the same way with the same message.

How was this patch tested?

Unit test in CSVSuite and manual test.

@SparkQA
Copy link

SparkQA commented Mar 11, 2017

Test build #74381 has started for PR 17256 at commit 9d91da1.

@HyukjinKwon
Copy link
Member Author

retest this please

@SparkQA
Copy link

SparkQA commented Mar 11, 2017

Test build #74386 has finished for PR 17256 at commit 9d91da1.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon HyukjinKwon changed the title [SPARK-19919][SQL] Defer throwing the exception for empty paths in CSV datasource into DataSource [WIP][SPARK-19919][SQL] Defer throwing the exception for empty paths in CSV datasource into DataSource Mar 12, 2017
* Infers the schema from `inputPaths` files.
*/
def infer(
final def infer(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This resembles JsonDataSource's one.

@SparkQA
Copy link

SparkQA commented Mar 12, 2017

Test build #74401 has finished for PR 17256 at commit 04e620c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon HyukjinKwon changed the title [WIP][SPARK-19919][SQL] Defer throwing the exception for empty paths in CSV datasource into DataSource [SPARK-19919][SQL] Defer throwing the exception for empty paths in CSV datasource into DataSource Mar 12, 2017
@SparkQA
Copy link

SparkQA commented Mar 13, 2017

Test build #74425 has started for PR 17256 at commit 87d3fc8.

@HyukjinKwon
Copy link
Member Author

retest this please

@SparkQA
Copy link

SparkQA commented Mar 13, 2017

Test build #74432 has finished for PR 17256 at commit 87d3fc8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member Author

cc @cloud-fan, could you see if it sounds good?

@cloud-fan
Copy link
Contributor

LGTM, merging to master!

@HyukjinKwon
Copy link
Member Author

Thank you @cloud-fan.

@asfgit asfgit closed this in 9281a3d Mar 22, 2017
@HyukjinKwon HyukjinKwon deleted the SPARK-19919 branch January 2, 2018 03:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants