[SPARK-15475][SQL] Add tests for writing and reading back empty data for Parquet, Json and Text data sources #13253

HyukjinKwon · 2016-05-22T12:38:01Z

What changes were proposed in this pull request?

This PR adds the tests for writing and reading back empty data for Parquet, JSON and Text data sources.

The tests were not added in HadoopFsRelationTest because each test is a little bit different due to the differences among those data sources.

JSON dose not write schema when it is empty.
TEXT needs a dataSchema option
Parquet writes schema when it is empty.

How was this patch tested?

Unit tests in ParquetHadoopFsRelationSuite, JsonHadoopFsRelationSuiteand SimpleTextHadoopFsRelationSuite.

…nd Text data sources

SparkQA · 2016-05-22T12:50:10Z

Test build #59102 has finished for PR 13253 at commit d450094.

This patch fails MiMa tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2016-05-22T13:10:17Z

retest this please

SparkQA · 2016-05-22T14:11:22Z

Test build #59103 has finished for PR 13253 at commit d450094.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2016-05-24T05:38:51Z

Hi @rxin and @marmbrus,
As you already know a "critical" issue was found here, SPARK-15393. So, SPARK-10216 was reverted. It seems writing and reading empty data back have been not tested across data sources.
This PR includes the test which resembles the one provided in the JIRA ticket.
Could you please take a look?

rxin · 2016-05-24T05:42:48Z

Did we ever end up fixing https://issues.apache.org/jira/browse/SPARK-10216 after it was reverted?

HyukjinKwon · 2016-05-24T07:26:41Z

@rxin No. it has not been fixed.. So, I wanted to add some test codes first to check writing and reading empty data to make sure this is working first.

The way to fix SPARK-10216 might be varied if they are data-source specific issue. For example, ORC does not write files for empty data and also does not allow to read empty files SPARK-8501..

So, I thought I can focus on fixing SPARK-10216 within, for example, Parquet (not WriterContainer) if SPARK-15393 is a Parquet datasource specific problem.

HyukjinKwon · 2016-05-24T10:02:27Z

I don't mind closing this. I will close if you think so. I can do this together later with SPARK-10216.

I just would appreciate if I can be sure that writing and reading empty files should be supported for all or some of them, for example, maybe only Parquet, ORC and CSV because they can write schema separately even if the data is empty.

SparkQA · 2016-05-24T10:22:08Z

Test build #59196 has finished for PR 13253 at commit c51fbe3.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

rxin · 2016-05-24T17:22:12Z

Yea let's just do it together. Thanks.

HyukjinKwon · 2016-05-24T22:02:52Z

Closing this. But could I please ask if it is basically perfered to support to wrtie and read empty data back for all data sources and others (or maybe only for Parquet, ORC and CSV)?
I just want to be sure on that.

rxin · 2016-05-24T22:08:34Z

Yes definitely want to be able to read/write empty dfs.

HyukjinKwon · 2016-05-25T01:17:12Z

@rxin Thank you!

HyukjinKwon added 2 commits May 22, 2016 21:31

Add tests for writing and reading back empty data for Parquet, Json a…

083fe0f

…nd Text data sources

Remove extra newline between tests

d450094

Move text test to TextSuite which was in wrong location.

c51fbe3

HyukjinKwon closed this May 24, 2016

HyukjinKwon deleted the json-parquet-test branch January 2, 2018 03:42

HyukjinKwon mentioned this pull request Nov 16, 2018

[SPARK-26081][SQL] Prevent empty files for empty partitions in Text datasources #23052

Closed

[SPARK-15475][SQL] Add tests for writing and reading back empty data for Parquet, Json and Text data sources #13253

[SPARK-15475][SQL] Add tests for writing and reading back empty data for Parquet, Json and Text data sources #13253

Uh oh!

Conversation

HyukjinKwon commented May 22, 2016

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

SparkQA commented May 22, 2016

Uh oh!

HyukjinKwon commented May 22, 2016

Uh oh!

SparkQA commented May 22, 2016

Uh oh!

HyukjinKwon commented May 24, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rxin commented May 24, 2016

Uh oh!

HyukjinKwon commented May 24, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HyukjinKwon commented May 24, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SparkQA commented May 24, 2016

Uh oh!

rxin commented May 24, 2016

Uh oh!

HyukjinKwon commented May 24, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rxin commented May 24, 2016

Uh oh!

HyukjinKwon commented May 25, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

HyukjinKwon commented May 24, 2016 •

edited

Loading

HyukjinKwon commented May 24, 2016 •

edited

Loading

HyukjinKwon commented May 24, 2016 •

edited

Loading

HyukjinKwon commented May 24, 2016 •

edited

Loading