Skip to content

Conversation

@liyezhang556520
Copy link
Contributor

What changes were proposed in this pull request?

When netty transfer data that is not FileRegion, data will be in format of ByteBuf, If the data is large, there will occur significant performance issue because there is memory copy underlying in sun.nio.ch.IOUtil.write, the CPU is 100% used, and network is very low.

In this PR, if data size is large, we will split it into small chunks to call WritableByteChannel.write(), so that avoid wasting of memory copy. Because the data can't be written within a single write, and it will call transferTo multiple times.

How was this patch tested?

Spark unit test and manual test.
Manual test:
sc.parallelize(Array(1,2,3),3).mapPartitions(a=>Array(new Array[Double](1024 * 1024 * 50)).iterator).reduce((a,b)=> a).length

For more details, please refer to SPARK-14290

@liyezhang556520
Copy link
Contributor Author

cc @davies

@liyezhang556520 liyezhang556520 changed the title [SPARK-14290][CORE][backport-1.6] avoid significant memory copy in Netty's tran… [SPARK-14290][SPARK-13352][CORE][backport-1.6] avoid significant memory copy in Netty's tran… Apr 11, 2016
@SparkQA
Copy link

SparkQA commented Apr 11, 2016

Test build #55516 has finished for PR 12296 at commit 9e37e7c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@davies
Copy link
Contributor

davies commented Apr 11, 2016

LGTM, merging this into branch-1.6, thanks!

asfgit pushed a commit that referenced this pull request Apr 11, 2016
…emory copy in Netty's tran…

## What changes were proposed in this pull request?
When netty transfer data that is not `FileRegion`, data will be in format of `ByteBuf`, If the data is large, there will occur significant performance issue because there is memory copy underlying in `sun.nio.ch.IOUtil.write`, the CPU is 100% used, and network is very low.

In this PR, if data size is large, we will split it into small chunks to call `WritableByteChannel.write()`, so that avoid wasting of memory copy. Because the data can't be written within a single write, and it will call `transferTo` multiple times.

## How was this patch tested?
Spark unit test and manual test.
Manual test:
`sc.parallelize(Array(1,2,3),3).mapPartitions(a=>Array(new Array[Double](1024 * 1024 * 50)).iterator).reduce((a,b)=> a).length`

For more details, please refer to [SPARK-14290](https://issues.apache.org/jira/browse/SPARK-14290)

Author: Zhang, Liye <[email protected]>

Closes #12296 from liyezhang556520/apache-branch-1.6-spark-14290.
zzcclp pushed a commit to zzcclp/spark that referenced this pull request Apr 12, 2016
…emory copy in Netty's tran…

When netty transfer data that is not `FileRegion`, data will be in format of `ByteBuf`, If the data is large, there will occur significant performance issue because there is memory copy underlying in `sun.nio.ch.IOUtil.write`, the CPU is 100% used, and network is very low.

In this PR, if data size is large, we will split it into small chunks to call `WritableByteChannel.write()`, so that avoid wasting of memory copy. Because the data can't be written within a single write, and it will call `transferTo` multiple times.

Spark unit test and manual test.
Manual test:
`sc.parallelize(Array(1,2,3),3).mapPartitions(a=>Array(new Array[Double](1024 * 1024 * 50)).iterator).reduce((a,b)=> a).length`

For more details, please refer to [SPARK-14290](https://issues.apache.org/jira/browse/SPARK-14290)

Author: Zhang, Liye <[email protected]>

Closes apache#12296 from liyezhang556520/apache-branch-1.6-spark-14290.

(cherry picked from commit baf2985)

Conflicts:
	network/common/src/main/java/org/apache/spark/network/protocol/MessageWithHeader.java
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants