Skip to content

Conversation

@original-brownbear
Copy link
Contributor

@original-brownbear original-brownbear commented Jul 18, 2019

  • Copying the request is not necessary for requests that don't escape their content BytesReference in a way that makes it referenced after the request was responded to. We can simply release it once the response has been generated and a lot of Unpooled allocations that way for many if not most requests. For now, this PR makes it so search- and bulk requests are not copied to Unpooled buffers.
  • Relates Reduce garbage for requests with unpooled buffers #32228
    • I think the issue that prevented that PR from being merged was solved by Optimize Bulk Message Parsing and Message Length Parsing #39634 that moved the bulk index marker search to ByteBuf bulk access so the composite buffer shouldn't require many additional bounds checks (I'd argue the bounds checks we add, we save when copying the composite buffer)
  • I couldn't necessarily reproduce much of a speedup (though no slowdown either) from this change, but I could reproduce a very measurable reduction in GC time with e.g. Rally's PMC (4g heap node and bulk requests of size 5k saw a reduction in young GC time by ~10% for me and a ~50% reduction in old gen GC time)
  • This also improves the real memory circuit breaker efficacy quite a bit I think since it strengthens the correlation between request size and actual memory used by the request (since we don't have a bunch of dangling, to-be-GCed requests hanging around).
  • Also somewhat relates Throttle Network Reads on Memory Pressure #44484

* Copying the request is not necessary here. We can simply release it once the response has been generated and a lot of `Unpooled` allocations that way
* Relates elastic#32228
   * I think the issue that preventet that PR  that PR from being merged was solved by elastic#39634 that moved the bulk index marker search to ByteBuf bulk access so the composite buffer shouldn't require many additional bounds checks  (I'd argue the bounds checks we add, we save when copying the composite buffer)
* I couldn't neccessarily reproduce much of a speedup from this change, but I could reproduce a very measureable reduction in GC time with e.g. Rally's PMC (4g heap node and bulk requests of size 5k saw a reduction in young GC time by ~10% for me)
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@Tim-Brooks
Copy link
Contributor

This PR relies on the fact that all REST actions will be done with the request content by the time they send a response. I'm not sure that this is a safe assumption.

@original-brownbear
Copy link
Contributor Author

@tbrooks8 appears there's only a single spot where this doesn't hold: https://github.com/elastic/elasticsearch/pull/44564/files#diff-05216fd3df413bda2df9486bce7a4e29R51

I would also argue that holding on to these bytes without copying is something that shouldn't be happening implicitly anywhere. Otherwise the whole memory behavior of the REST layer becomes pretty unpredictable. In the case of storing the pipeline source it's ok to use the request body for that as a special case, but if we generally allow request contents to be referenced beyond responding to the request the whole notion of the circuit breaker becomes wrong imo.

@original-brownbear
Copy link
Contributor Author

Jenkins run elasticsearch-ci/packaging-sample

@original-brownbear original-brownbear marked this pull request as ready for review July 19, 2019 14:16
@Tim-Brooks
Copy link
Contributor

This probably requires a team discuss.

@original-brownbear
Copy link
Contributor Author

Sure, added team-discuss :)

@original-brownbear
Copy link
Contributor Author

Putting this back into WIP, thanks @tbrooks8 for pointing out that org.elasticsearch.transport.netty4.ByteBufBytesReference#toBytesRef is not safe here.

@original-brownbear
Copy link
Contributor Author

Thanks Yannick. I simplified the handling of content in 1812be4 and simplified the concurrency of this object through that. It was kind of pointless to optimize the unpooled case with the setting of null this way as explained above.

Not sure about a test for the unpooled buffer case, given the effort required for that. Maybe we could do so (add tricky tests) in a follow-up and use the logic from #44881 that allows for more wholistically optimizing the Unpooled case across all message sizes?

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Let's have @tbrooks8 ok this again as well.

@original-brownbear
Copy link
Contributor Author

Link #49699 which shows OOM from the copies avoided here most likely

@original-brownbear
Copy link
Contributor Author

Ping @tbrooks8 There shouldn't be much left to do here here since the changes from the last time you looked at it are minor :) Thanks!

Copy link
Contributor

@Tim-Brooks Tim-Brooks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I missed this in the last review cycle, but I don't understand why we completely reverted this for NIO?

I thought you were going the direction of always copying instead of doing the cast optimization.

I think we should still using the existing buffer when allowsUnsafeBuffers return true. It's just that we should not attempt to do the non-copy when allowsUnsafeBuffers return false and we think the buffer is unpooled (since we do no reliably know that).

I guess we can bring NIO back in a follow-up so I'll approve this.

@original-brownbear
Copy link
Contributor Author

@tbrooks8 thanks!

I guess I missed this in the last review cycle, but I don't understand why we completely reverted this for NIO?

🤦‍♂️ ... I completely misread your comment on the NIO situation back in the intial review ... will open the follow up tomorrow :)

@original-brownbear original-brownbear merged commit 5ddf920 into elastic:master Dec 3, 2019
@original-brownbear original-brownbear deleted the never-copy-http-buffer branch December 3, 2019 21:44
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this pull request Dec 3, 2019
* Copying the request is not necessary here. We can simply release it once the response has been generated and a lot of `Unpooled` allocations that way
* Relates elastic#32228
   * I think the issue that preventet that PR  that PR from being merged was solved by elastic#39634 that moved the bulk index marker search to ByteBuf bulk access so the composite buffer shouldn't require many additional bounds checks  (I'd argue the bounds checks we add, we save when copying the composite buffer)
* I couldn't neccessarily reproduce much of a speedup from this change, but I could reproduce a very measureable reduction in GC time with e.g. Rally's PMC (4g heap node and bulk requests of size 5k saw a reduction in young GC time by ~10% for me)
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this pull request Dec 4, 2019
@original-brownbear
Copy link
Contributor Author

NIO version in #49819 :)

original-brownbear added a commit that referenced this pull request Dec 4, 2019
* Copying the request is not necessary here. We can simply release it once the response has been generated and a lot of `Unpooled` allocations that way
* Relates #32228
   * I think the issue that preventet that PR  that PR from being merged was solved by #39634 that moved the bulk index marker search to ByteBuf bulk access so the composite buffer shouldn't require many additional bounds checks  (I'd argue the bounds checks we add, we save when copying the composite buffer)
* I couldn't neccessarily reproduce much of a speedup from this change, but I could reproduce a very measureable reduction in GC time with e.g. Rally's PMC (4g heap node and bulk requests of size 5k saw a reduction in young GC time by ~10% for me)
SivagurunathanV pushed a commit to SivagurunathanV/elasticsearch that referenced this pull request Jan 23, 2020
* Copying the request is not necessary here. We can simply release it once the response has been generated and a lot of `Unpooled` allocations that way
* Relates elastic#32228
   * I think the issue that preventet that PR  that PR from being merged was solved by elastic#39634 that moved the bulk index marker search to ByteBuf bulk access so the composite buffer shouldn't require many additional bounds checks  (I'd argue the bounds checks we add, we save when copying the composite buffer)
* I couldn't neccessarily reproduce much of a speedup from this change, but I could reproduce a very measureable reduction in GC time with e.g. Rally's PMC (4g heap node and bulk requests of size 5k saw a reduction in young GC time by ~10% for me)
original-brownbear added a commit that referenced this pull request Jan 24, 2020
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this pull request Jan 24, 2020
original-brownbear added a commit that referenced this pull request Jan 24, 2020
@original-brownbear original-brownbear restored the never-copy-http-buffer branch August 6, 2020 18:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants