[SPARK-14467][SQL] Experiments: Async I/O in FileScanRDD #12667

sameeragarwal · 2016-04-25T20:40:35Z

What changes were proposed in this pull request?

Builds on #12243 to help benchmark improvements by interleaving CPU and I/O in FileScanRDD.

How was this patch tested?

Existing Tests

This patch updates FileScanRDD to start reading from the next file while the current file is being processed. The goal is to have better interleaving of CPU and IO. It does this by launching a future which will asynchronously start preparing the next file to be read. The expectation is that the async task is IO intensive and the current file (which includes all the computation for the query plan) is CPU intensive. For some file formats, this would just mean opening the file and the initial setup. For file formats like parquet, this would mean doing all the IO for all the columns.

SparkQA · 2016-04-25T22:06:06Z

Test build #56922 has finished for PR 12667 at commit f3a2167.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

sameeragarwal · 2016-04-28T18:03:45Z

test this please

SparkQA · 2016-04-28T19:27:24Z

Test build #57265 has finished for PR 12667 at commit 796d5eb.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

nongli and others added 6 commits April 7, 2016 14:45

Simplify and fix tests.

bc11dd5

Resolve conflicts

0655e5e

restructure

8aebf94

add nextIterator

8799cc8

cleanup

f3a2167

sameeragarwal closed this Apr 25, 2016

sameeragarwal added 2 commits April 27, 2016 15:44

Merge branch 'master' of github.com:apache/spark into filescan

ec7d65d

fix conf

796d5eb

sameeragarwal reopened this Apr 28, 2016

sameeragarwal closed this Apr 28, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-14467][SQL] Experiments: Async I/O in FileScanRDD #12667

[SPARK-14467][SQL] Experiments: Async I/O in FileScanRDD #12667

Uh oh!

sameeragarwal commented Apr 25, 2016

Uh oh!

SparkQA commented Apr 25, 2016

Uh oh!

sameeragarwal commented Apr 28, 2016

Uh oh!

SparkQA commented Apr 28, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[SPARK-14467][SQL] Experiments: Async I/O in FileScanRDD #12667

[SPARK-14467][SQL] Experiments: Async I/O in FileScanRDD #12667

Uh oh!

Conversation

sameeragarwal commented Apr 25, 2016

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

SparkQA commented Apr 25, 2016

Uh oh!

sameeragarwal commented Apr 28, 2016

Uh oh!

SparkQA commented Apr 28, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants