-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-19919][SQL] Defer throwing the exception for empty paths in CSV datasource into DataSource
#17256
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #74381 has started for PR 17256 at commit |
|
retest this please |
|
Test build #74386 has finished for PR 17256 at commit
|
DataSourceDataSource
| * Infers the schema from `inputPaths` files. | ||
| */ | ||
| def infer( | ||
| final def infer( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This resembles JsonDataSource's one.
|
Test build #74401 has finished for PR 17256 at commit
|
DataSourceDataSource
|
Test build #74425 has started for PR 17256 at commit |
|
retest this please |
|
Test build #74432 has finished for PR 17256 at commit
|
|
cc @cloud-fan, could you see if it sounds good? |
|
LGTM, merging to master! |
|
Thank you @cloud-fan. |
What changes were proposed in this pull request?
This PR proposes to defer throwing the exception within
DataSource.Currently, if other datasources fail to infer the schema, it returns
Noneand then this is being validated inDataSourceas below:However, CSV it checks it within the datasource implementation and throws another exception message as below:
We could remove this duplicated check and validate this in one place in the same way with the same message.
How was this patch tested?
Unit test in
CSVSuiteand manual test.