Skip to content

Conversation

@liyezhang556520
Copy link
Contributor

What changes were proposed in this pull request?

In this patch, we set the initial maxNumComponents to Integer.MAX_VALUE instead of the default size ( which is 16) when allocating compositeBuffer in TransportFrameDecoder because compositeBuffer will introduce too many memory copies underlying if compositeBuffer is with default maxNumComponents when the frame size is large (which result in many transport messages). For details, please refer to SPARK-14242.

How was this patch tested?

spark unit tests and manual tests.
For manual tests, we can reproduce the performance issue with following code:
sc.parallelize(Array(1,2,3),3).mapPartitions(a=>Array(new Array[Double](1024 * 1024 * 50)).iterator).reduce((a,b)=> a).length
It's easy to see the performance gain, both from the running time and CPU usage.

}

// Otherwise, create a composite buffer.
CompositeByteBuf frame = buffers.getFirst().alloc().compositeBuffer();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually we can set the maxNumComponents for compositeBuffer to avoid consolidate underlying, such as CompositeByteBuf frame = buffers.getFirst().alloc().compositeBuffer(Integer.MAX_VALUE);, but this might be not a good choice.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it not a good choice? With your change, you're replacing "maybe copy multiple times" with "always copy once". If there's a way to avoid the copy altogether, why not do it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vanzin , I'm not sure why Netty underlying set a maximum number components (max size is Integer.MAX_VALUE), and the default value is only 16, this seems very small for consolidation. Will it occurs other problem when there are too many small buffers under compositeBuffer? Is that why it will consolidate when the small buffer number reaches the maxNumCompnent?

@SparkQA
Copy link

SparkQA commented Mar 29, 2016

Test build #54441 has finished for PR 12038 at commit 8908585.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@zsxwing
Copy link
Member

zsxwing commented Mar 29, 2016

Maybe we can try to use CompositeByteBuf.addComponents? So that we can avoid to do the copy when the size of buffers are not more than 16.

@zsxwing
Copy link
Member

zsxwing commented Mar 29, 2016

cc @vanzin

@liyezhang556520
Copy link
Contributor Author

@vanzin , I think @zsxwing 's idea of using CompositeByteBuf.addComponents is a better choice, which will only introduce exactly one copy if the small buffer number is lager than 16 and will not introduce any copy if that less than 16. Let me update this PR first.

@liyezhang556520 liyezhang556520 changed the title [SPARK-14242][CORE][Network] avoid using compositeBuffer for frame decoder [SPARK-14242][CORE][Network] avoid too many copies in compositeBuffer for frame decoder Mar 30, 2016
@SparkQA
Copy link

SparkQA commented Mar 30, 2016

Test build #54492 has finished for PR 12038 at commit 1eacf55.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@vanzin
Copy link
Contributor

vanzin commented Mar 30, 2016

will only introduce exactly one copy if the small buffer number is lager than 16

That's better, but is it needed at all? I don't see any comments about why consolidating the buffers is a win in the source for CompositeByteBuf. Traversing the single buffer should be slightly faster because there's less bookkeeping, but there's the cost of copying that data in the first place.

When testing this code, I remember that during large transfers packets would arrive in 64k chunks at the most, so that means that once you're transferring more than 1MB, you'd have to copy things.

Have you tried not consolidating to see whether there's any negative side-effect?

@liyezhang556520
Copy link
Contributor Author

@vanzin

That's better, but is it needed at all? I don't see any comments about why consolidating the buffers is a win in the source for CompositeByteBuf. Traversing the single buffer should be slightly faster because there's less bookkeeping, but there's the cost of copying that data in the first place.

If so we can just set the maxNumComponents with Integer.Max_VALUE for compositeByteBuffer.

When testing this code, I remember that during large transfers packets would arrive in 64k chunks at the most, so that means that once you're transferring more than 1MB, you'd have to copy things.

In my test, the chunk sizes are mainly around 20~30 KB.

Have you tried not consolidating to see whether there's any negative side-effect?

I tested previously with buffers.getFirst().alloc().compositeBuffer(Integer.MAX_VALUE);, and with frame size over 1GB, which consist of about 40000 chunks, I didn't see negative side-effect.

Ok, let's solve this issue without copy for any case.

@liyezhang556520 liyezhang556520 changed the title [SPARK-14242][CORE][Network] avoid too many copies in compositeBuffer for frame decoder [SPARK-14242][CORE][Network] avoid copy in compositeBuffer for frame decoder Mar 31, 2016
@SparkQA
Copy link

SparkQA commented Mar 31, 2016

Test build #54578 has finished for PR 12038 at commit 80f7573.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@liyezhang556520
Copy link
Contributor Author

retest this please.

@SparkQA
Copy link

SparkQA commented Mar 31, 2016

Test build #54580 has finished for PR 12038 at commit 80f7573.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@vanzin
Copy link
Contributor

vanzin commented Mar 31, 2016

@liyezhang556520 looks great but could you update the commit message to reflect the actual change? Thanks

@zsxwing
Copy link
Member

zsxwing commented Mar 31, 2016

LGTM @liyezhang556520 please ping me when you update the PR description.

@liyezhang556520
Copy link
Contributor Author

@zsxwing , I updated the commit description. Thank you @zsxwing and @vanzin for reviewing.

@zsxwing
Copy link
Member

zsxwing commented Apr 1, 2016

Merging to master. Thanks, @liyezhang556520 !

@asfgit asfgit closed this in 96941b1 Apr 1, 2016
zzcclp pushed a commit to zzcclp/spark that referenced this pull request Apr 1, 2016
…rame decoder apache#12038

[EXT][SPARK-13583][CORE][STREAMING] Remove unused imports and add checkstyle rule apache#11438
@davies
Copy link
Contributor

davies commented Apr 11, 2016

cherry-picked into 1.6 branch

@liyezhang556520 liyezhang556520 deleted the spark-14242 branch April 11, 2016 08:16
@zzcclp
Copy link
Contributor

zzcclp commented Apr 11, 2016

@davies has this pr cherry-picked into branch-1.6?

@davies
Copy link
Contributor

davies commented Apr 11, 2016

Yes

@zzcclp
Copy link
Contributor

zzcclp commented Apr 12, 2016

@davies , but I didn't find this commit in branch-1.6.

@liyezhang556520
Copy link
Contributor Author

@davies , I didn't see the commit in branch-1.6 either, seems this commit can not be simply git cherry-pick because the file path is not the same in master and branch-1.6. Do I need to submit another PR for back-port?

asfgit pushed a commit that referenced this pull request Apr 12, 2016
…decoder

## What changes were proposed in this pull request?
In this patch, we set the initial `maxNumComponents` to `Integer.MAX_VALUE` instead of the default size ( which is 16) when allocating `compositeBuffer` in `TransportFrameDecoder` because `compositeBuffer` will introduce too many memory copies underlying if `compositeBuffer` is with default `maxNumComponents` when the frame size is large (which result in many transport messages). For details, please refer to [SPARK-14242](https://issues.apache.org/jira/browse/SPARK-14242).

## How was this patch tested?
spark unit tests and manual tests.
For manual tests, we can reproduce the performance issue with following code:
`sc.parallelize(Array(1,2,3),3).mapPartitions(a=>Array(new Array[Double](1024 * 1024 * 50)).iterator).reduce((a,b)=> a).length`
It's easy to see the performance gain, both from the running time and CPU usage.

Author: Zhang, Liye <[email protected]>

Closes #12038 from liyezhang556520/spark-14242.
@davies
Copy link
Contributor

davies commented Apr 12, 2016

sorry, I forgot to push. it's in branch-1.6 now.

zzcclp pushed a commit to zzcclp/spark that referenced this pull request Apr 12, 2016
…decoder

In this patch, we set the initial `maxNumComponents` to `Integer.MAX_VALUE` instead of the default size ( which is 16) when allocating `compositeBuffer` in `TransportFrameDecoder` because `compositeBuffer` will introduce too many memory copies underlying if `compositeBuffer` is with default `maxNumComponents` when the frame size is large (which result in many transport messages). For details, please refer to [SPARK-14242](https://issues.apache.org/jira/browse/SPARK-14242).

spark unit tests and manual tests.
For manual tests, we can reproduce the performance issue with following code:
`sc.parallelize(Array(1,2,3),3).mapPartitions(a=>Array(new Array[Double](1024 * 1024 * 50)).iterator).reduce((a,b)=> a).length`
It's easy to see the performance gain, both from the running time and CPU usage.

Author: Zhang, Liye <[email protected]>

Closes apache#12038 from liyezhang556520/spark-14242.

(cherry picked from commit 663a492)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants