Skip to content

Conversation

@gatorsmile
Copy link
Member

What changes were proposed in this pull request?

This PR is to backport #16481 to Spark 2.1


DataFrameWriter's save() API is performing a unnecessary full filesystem scan for the saved files. The save() API is the most basic/core API in DataFrameWriter. We should avoid it.

How was this patch tested?

Added and modified the test cases

@SparkQA
Copy link

SparkQA commented Jan 15, 2017

Test build #71402 has finished for PR 16588 at commit a246ae9.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member Author

retest this please

@SparkQA
Copy link

SparkQA commented Jan 15, 2017

Test build #71403 has finished for PR 16588 at commit a246ae9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member Author

cc @cloud-fan

@cloud-fan
Copy link
Contributor

thanks, merging to 2.1

asfgit pushed a commit that referenced this pull request Jan 16, 2017
… not scan all the saved files #16481

### What changes were proposed in this pull request?

#### This PR is to backport #16481 to Spark 2.1
---
`DataFrameWriter`'s [save() API](https://github.com/gatorsmile/spark/blob/5d38f09f47a767a342a0a8219c63efa2943b5d1f/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala#L207) is performing a unnecessary full filesystem scan for the saved files. The save() API is the most basic/core API in `DataFrameWriter`. We should avoid it.

### How was this patch tested?
Added and modified the test cases

Author: gatorsmile <[email protected]>

Closes #16588 from gatorsmile/backport-19092.
@gatorsmile
Copy link
Member Author

Thanks!

@gatorsmile gatorsmile closed this Jan 16, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants