Skip to content

Conversation

@sameeragarwal
Copy link
Member

What changes were proposed in this pull request?

Builds on #12243 to help benchmark improvements by interleaving CPU and I/O in FileScanRDD.

How was this patch tested?

Existing Tests

nongli and others added 6 commits April 7, 2016 14:45
This patch updates FileScanRDD to start reading from the next file while the current file
is being processed. The goal is to have better interleaving of CPU and IO. It does this
by launching a future which will asynchronously start preparing the next file to be read.
The expectation is that the async task is IO intensive and the current file (which
includes all the computation for the query plan) is CPU intensive. For some file formats,
this would just mean opening the file and the initial setup. For file formats like
parquet, this would mean doing all the IO for all the columns.
@SparkQA
Copy link

SparkQA commented Apr 25, 2016

Test build #56922 has finished for PR 12667 at commit f3a2167.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@sameeragarwal sameeragarwal reopened this Apr 28, 2016
@sameeragarwal
Copy link
Member Author

test this please

@SparkQA
Copy link

SparkQA commented Apr 28, 2016

Test build #57265 has finished for PR 12667 at commit 796d5eb.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants