[SPARK-19092] [SQL] [Backport-2.1] Save() API of DataFrameWriter should not scan all the saved files #16481 #16588

gatorsmile · 2017-01-15T19:32:30Z

What changes were proposed in this pull request?

This PR is to backport #16481 to Spark 2.1

DataFrameWriter's save() API is performing a unnecessary full filesystem scan for the saved files. The save() API is the most basic/core API in DataFrameWriter. We should avoid it.

How was this patch tested?

Added and modified the test cases

SparkQA · 2017-01-15T19:49:24Z

Test build #71402 has finished for PR 16588 at commit a246ae9.

This patch fails MiMa tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2017-01-15T19:52:40Z

retest this please

SparkQA · 2017-01-15T22:14:37Z

Test build #71403 has finished for PR 16588 at commit a246ae9.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2017-01-15T22:32:47Z

cc @cloud-fan

cloud-fan · 2017-01-16T02:58:34Z

thanks, merging to 2.1

… not scan all the saved files #16481 ### What changes were proposed in this pull request? #### This PR is to backport #16481 to Spark 2.1 --- `DataFrameWriter`'s [save() API](https://github.com/gatorsmile/spark/blob/5d38f09f47a767a342a0a8219c63efa2943b5d1f/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala#L207) is performing a unnecessary full filesystem scan for the saved files. The save() API is the most basic/core API in `DataFrameWriter`. We should avoid it. ### How was this patch tested? Added and modified the test cases Author: gatorsmile <[email protected]> Closes #16588 from gatorsmile/backport-19092.

gatorsmile · 2017-01-16T04:54:45Z

Thanks!

backport

a246ae9

gatorsmile closed this Jan 16, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-19092] [SQL] [Backport-2.1] Save() API of DataFrameWriter should not scan all the saved files #16481 #16588

[SPARK-19092] [SQL] [Backport-2.1] Save() API of DataFrameWriter should not scan all the saved files #16481 #16588

Uh oh!

gatorsmile commented Jan 15, 2017

Uh oh!

SparkQA commented Jan 15, 2017

Uh oh!

gatorsmile commented Jan 15, 2017

Uh oh!

SparkQA commented Jan 15, 2017

Uh oh!

gatorsmile commented Jan 15, 2017

Uh oh!

cloud-fan commented Jan 16, 2017

Uh oh!

gatorsmile commented Jan 16, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[SPARK-19092] [SQL] [Backport-2.1] Save() API of DataFrameWriter should not scan all the saved files #16481 #16588

[SPARK-19092] [SQL] [Backport-2.1] Save() API of DataFrameWriter should not scan all the saved files #16481 #16588

Uh oh!

Conversation

gatorsmile commented Jan 15, 2017

What changes were proposed in this pull request?

This PR is to backport #16481 to Spark 2.1

How was this patch tested?

Uh oh!

SparkQA commented Jan 15, 2017

Uh oh!

gatorsmile commented Jan 15, 2017

Uh oh!

SparkQA commented Jan 15, 2017

Uh oh!

gatorsmile commented Jan 15, 2017

Uh oh!

cloud-fan commented Jan 16, 2017

Uh oh!

gatorsmile commented Jan 16, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants