-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-15475][SQL] Add tests for writing and reading back empty data for Parquet, Json and Text data sources #13253
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #59102 has finished for PR 13253 at commit
|
|
retest this please |
|
Test build #59103 has finished for PR 13253 at commit
|
|
Hi @rxin and @marmbrus, |
|
Did we ever end up fixing https://issues.apache.org/jira/browse/SPARK-10216 after it was reverted? |
|
@rxin No. it has not been fixed.. So, I wanted to add some test codes first to check writing and reading empty data to make sure this is working first. The way to fix SPARK-10216 might be varied if they are data-source specific issue. For example, ORC does not write files for empty data and also does not allow to read empty files SPARK-8501.. So, I thought I can focus on fixing SPARK-10216 within, for example, Parquet (not |
|
I don't mind closing this. I will close if you think so. I can do this together later with SPARK-10216. I just would appreciate if I can be sure that writing and reading empty files should be supported for all or some of them, for example, maybe only Parquet, ORC and CSV because they can write schema separately even if the data is empty. |
|
Test build #59196 has finished for PR 13253 at commit
|
|
Yea let's just do it together. Thanks. |
|
Closing this. But could I please ask if it is basically perfered to support to wrtie and read empty data back for all data sources and others (or maybe only for Parquet, ORC and CSV)? |
|
Yes definitely want to be able to read/write empty dfs. |
|
@rxin Thank you! |
What changes were proposed in this pull request?
This PR adds the tests for writing and reading back empty data for Parquet, JSON and Text data sources.
The tests were not added in
HadoopFsRelationTestbecause each test is a little bit different due to the differences among those data sources.dataSchemaoptionHow was this patch tested?
Unit tests in
ParquetHadoopFsRelationSuite,JsonHadoopFsRelationSuiteandSimpleTextHadoopFsRelationSuite.