Skip to content

Conversation

@danielmitterdorfer
Copy link
Member

@danielmitterdorfer danielmitterdorfer commented Apr 10, 2018

With this commit we determine the maximum number of buffers that Netty
keeps while accumulating one HTTP request based on the maximum content
length. Previously, we kept the default value of 1024 which is too small
for bulk requests which leads to unnecessary copies of byte buffers
internally.

With this commit we determine the maximum number of buffers that Netty
keeps while accumulating one HTTP request based on the maximum content
length. Previously, we kept the default value of 1024 which is too small
for bulk requests which leads to unecessary copies of byte buffers
internally.
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra

@danielmitterdorfer
Copy link
Member Author

I have now run several benchmarks in our benchmarking suite with different heap sizes. Below are the results for two extreme cases:

  • geopoint which contains only very small documents. I ran the benchmarks with an increased bulk size of 50.000 docs per bulk (instead of 5.000).
  • pmc where each document is one academic paper. pmc is rather I/O intensive and has large bulk requests compared to geopoint.

I ran these benchmarks with different heap sizes to see for which configurations the max cumulation buffer changes have an effect. The results match the expectation that with these change, Elasticsearch performs better under low memory conditions while performing similar to or slightly better than before when more heap memory is available. One data point where this is very apparent is that we could successfully finish pmc with 768MB heap with these changes whereas the benchmark candidate OOMEd after indexing 30% of the document corpus without these changes (indicated with indexing thoughput set to zero in the graph below).

The graphs below show the achieved median indexing throughput for those tracks along with error bars for the minimum and maximum throughput. In blue we see the current default value of at most 1024 cumulation buffer components whereas in orange we see the new default of 69905 cumulation buffer components assuming a default HTTP max content length of 100MB (this value is derived based on the HTTP max content length; see the code for details).

throughput_geopoint

throughput_pmc

I'd intend to merge these changes to master after successful review, let it bake there for a while and backport it to 6.x.

Copy link
Member

@jasontedor jasontedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a question.

// Note that we are *not* pre-allocating any memory based on this setting but rather determine the CompositeByteBuf's capacity.
// The tradeoff is between less (but larger) buffers that are contained in the CompositeByteBuf and more (but smaller) buffers.
// With the default max content length of 100MB and a MTU of 1500 bytes we would allow 69905 entries.
long maxBufferComponentsEstimate = Math.round((double) (maxContentLength.getBytes() / MTU_ETHERNET.getBytes()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can get the interface MTU from NetworkInterface API; should we use this here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a look this and I really like your idea. However, I am not sure it is feasible: I guess we need to resolve the network interface that is associated with the publish address (leveraging NetworkService).

While I also dislike having the MTU specified as a constant here (instead of retrieving it from the network interface), it serves as an input for an estimation. The "worst" practical MTU is 1500 byte, another typical one is 65536 byte for loopback (that's not always the case for loopback but it is usually higher than for Ethernet). What happens if we assume an MTU of 1500 for our estimation in the loopback case? Our estimate would allow more buffer components than are expected for loopback. But in practice we do not expect to reach that theoretical capacity. A specific example for the loopback case (with a max content length of 100 MB):

  • Estimated maximum number of buffers: 100 MB / 1500 byte = 69905
  • Actual maximum number of buffers: 100 MB / 65536 byte = 1600

This means that we expect to reserve space for at most 1600 buffers in the HttpObjectAggregator but overestimate it to at most 69905 (which we do not reach in practice).

To summarize, I see two possibilities:

  • Determine the publish address with NetworkService, find the matching network interface and determine its MTU. This is correct but more complex.
  • Rename the constant MTU_ETHERNET to something like SMALLEST_EXPECTED_MTU (while RFC 791 states 68 bytes as the minimum, I'd stick to 1500 for practical purposes). It's not ideal but in very special cases, users can explicitly define this expert setting (http.netty.max_composite_buffer_components) directly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jasontedor wdyt about the above?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that jumbo frames are common enough that we should try to go the extra mile here. If it's not possible to do cleanly, I would at least like to see a system property that can be set to set the MTU to 9000.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion. I have now introduced a system property with a default is 1500 (bytes).

//
// Note that we are *not* pre-allocating any memory based on this setting but rather determine the CompositeByteBuf's capacity.
// The tradeoff is between less (but larger) buffers that are contained in the CompositeByteBuf and more (but smaller) buffers.
// With the default max content length of 100MB and a MTU of 1500 bytes we would allow 69905 entries.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a nit, but would you please use the multi-line style:

/*
 *
 */
```?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Addressed in b5c14fd.

@danielmitterdorfer
Copy link
Member Author

@elasticmachine retest this please

@Tim-Brooks
Copy link
Contributor

At some point it may be worth benchmarking this with TLS/SSL enabled as some of the assumptions (1500 byte buffer sizes making it to HttpObjectAggregator) may not hold. A full TLS/SSL packet (which could be 16kb) will need to be decrypted prior to doing http decoding. The risk of course is that larger component lists consume more heap. But I think it will not be a problem as Netty seems to start with lists of 16 components in the CompositeByteBufs and resizes as necessary. Still, might be worth checking. Or could just monitor security benchmarks after merge.

@danielmitterdorfer
Copy link
Member Author

Good point @tbrooks8. I ran the PMC benchmarks x-pack now and the results are similar to Elasticsearch without x-pack (i.e. we still see a benefit of implementing this change):

throughput_pmc x-pack

The risk of course is that larger component lists consume more heap.

The number that is adjusted here defines at which point the original (small) buffers get copied into one larger buffer. We do not preallocate any memory based on that number. I agree that we might have more elements in the CompositeByteBuf but based on the benchmark results I'd conclude that more entries the internal array list are less problematic than the additional buffer copies.

@danielmitterdorfer
Copy link
Member Author

@elasticmachine retest this please

Copy link
Member

@jasontedor jasontedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left one comment but LGTM to me now.

*
* By default we assume the Ethernet MTU (1500 bytes) but users can override it with a system property.
*/
private static final ByteSizeValue MTU = new ByteSizeValue(Long.valueOf(System.getProperty("es.net.mtu", "1500")));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: it does not really matter here but this boxes and this should instead be Long.parseLong.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I fixed it in 635c9fb.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚢

@danielmitterdorfer
Copy link
Member Author

Thanks for review @jasontedor and your comments @tbrooks8! I let this bake on master for a while and will then backport to 6.x.

@danielmitterdorfer
Copy link
Member Author

@elasticmachine retest this please

@danielmitterdorfer danielmitterdorfer merged commit 09cf530 into elastic:master May 11, 2018
@danielmitterdorfer danielmitterdorfer deleted the max-composite-buffer-sizing branch May 11, 2018 08:01
@jasontedor
Copy link
Member

@danielmitterdorfer I do not think this change should be backported to the 6.3 branch.

@jasontedor jasontedor removed the v6.3.1 label May 11, 2018
danielmitterdorfer added a commit that referenced this pull request May 15, 2018
With this commit we determine the maximum number of buffers that Netty keeps
while accumulating one HTTP request based on the maximum content length (default
1500 bytes, overridable with the system property `es.net.mtu`). Previously, we
kept the default value of 1024 which is too small for bulk requests which leads
to unnecessary copies of byte buffers internally.

Relates #29448
@danielmitterdorfer
Copy link
Member Author

Agreed w.r.t releases. Backported to 6.x in 71d3297.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants