Disable netty direct buffer pooling by default #44837

Tim-Brooks · 2019-07-24T22:36:45Z

Elasticsearch does not grant Netty reflection access to get Unsafe. The
only mechanism that currently exists to free direct buffers in a timely
manner is to use Unsafe. This leads to the occasional scenario, under
heavy network load, that direct byte buffers can slowly build up without
being freed.

This commit disables Netty direct buffer pooling and moves to a strategy
of using a single thread-local direct buffer for interfacing with sockets.
This will reduce the memory usage from networking. Elasticsearch
currently derives very little value from direct buffer usage (TLS,
compression, Lucene, Elasticsearch handling, etc all use heap bytes). So
this seems like the correct trade-off until that changes.

elasticmachine · 2019-07-24T22:36:47Z

Pinging @elastic/es-distributed

Tim-Brooks · 2019-07-24T22:48:55Z

Every Netty channel has a buffer for outgoing writes. Immediately prior to putting outgoing messages in this buffer netty attempts to wrap heap bytes (always the case for ES) into direct bytes. Unfortunately due to friction between ES and netty both the heap bytes and direct bytes are retained until the flush is complete so this increases memory usage. Additionally having direct buffer pooling enabled == 16 MB direct chunks per thread. This can scale to gigabytes under load.

We have remediated the OOM issue with rate limiting writes (for the internal transport) and setting JVM options to force GC collections at certain points of direct memory usage. But, IMO I don't really see a significant value for having this direct memory pooling enabled when we are currently so heap bytes oriented. The strategy in this PR (1 MB IO buffer per thread + copying) is the strategy we use for transport-nio. And we see that transport-nio performs similarly (or slightly better) on our nightlys. I think that a strategy that significantly reduces memory usage in exchange for the possibility of more copying is the correct strategy.

I had attempted to open a PR (netty/netty#9183) with netty that would allow us to set our own Unsafe cleaner so that netty buffer pooling could be used. However, it looks like Netty is not going to accept the PR. The most recent suggestions are 1. we give them access to Unsafe and they will create a code path to only use it for cleaning and 2. We completely rebuild the memory allocators. I assume that 1 is not going to get much support and 2 is really a very significant commitment when I don't think we need that currently.

Tim-Brooks · 2019-07-24T23:43:52Z

Also I added team-discuss so we will discuss this in a distributed meeting. But I tagged Jason and Simon so that you two are aware that I would like to consider this.

…oling

Tim-Brooks · 2019-07-31T16:59:25Z

This is ready for review.

…oling

modules/transport-netty4/src/main/java/org/elasticsearch/transport/CopyBytesSocketChannel.java

s1monw

FWIW I think this makes a lot of sense to me.

original-brownbear

Looks just fine overall, thanks @tbrooks8! I added a few comments on testing, docs and minor optimizations.

original-brownbear · 2019-08-08T07:03:37Z

modules/transport-netty4/build.gradle

    systemProperty 'es.set.netty.runtime.available.processors', 'false'
+
+    // Disable direct buffer pooling as it is disabled by default in Elasticsearch
+    systemProperty 'io.netty.allocator.numDirectArenas', '0'


Shouldn't we randomise this for tests via the test seed maybe since this has non-trivial effects on what code actually executes?

I'm not clear how to accomplish this. I spoke to @mark-vieira and he indicated there is not a great way to randomize based on a seed in a gradle file. I can apply the system property in ESTestCase, but the system properties are set prior to the random utils being available.

I also am not clear the extent to which I should invest time in this? Our expectation I think is that these system properties should not really be messed with unless you really know what you are doing. But if you have some straightforward way in mind to improve coverage, let me know.

We do this for Azure:

https://github.com/elastic/elasticsearch/blob/master/plugins/repository-azure/qa/microsoft-azure-storage/build.gradle#L85

so maybe just something like:

Long.parseUnsignedLong(project.rootProject.testSeed.tokenize(':').get(0), 16)) %2 == 0

to get a 50:50 split boolean? @mark-vieira any reason not to do this? (I think do exactly this in multiple places already)

original-brownbear · 2019-08-08T07:03:50Z

modules/transport-netty4/build.gradle

    systemProperty 'es.set.netty.runtime.available.processors', 'false'
+
+    // Disable direct buffer pooling as it is disabled by default in Elasticsearch
+    systemProperty 'io.netty.allocator.numDirectArenas', '0'


Same here, maybe randomise here?

modules/transport-netty4/src/main/java/org/elasticsearch/transport/CopyBytesSocketChannel.java

.../transport-netty4/src/main/java/org/elasticsearch/http/netty4/Netty4HttpServerTransport.java

…oling

original-brownbear

LGTM thanks @tbrooks8 !

Would be interesting to hear what @mark-vieira thinks about our test seed hack in the build files, but even if he's fine with it I think we can just add that in a follow-up -> no need to delay things here :)

Tim-Brooks · 2019-08-08T17:28:31Z

Would be interesting to hear what @mark-vieira thinks about our test seed hack in the build files, but even if he's fine with it I think we can just add that in a follow-up -> no need to delay things here :)

I'll open a follow-up. It probably also makes sense to randomize pooled/unpooled since we set those to different values based on JVM ergonomics. So we can have a PR dedicated to just randomizing the system properties.

…oling

Tim-Brooks · 2019-08-08T18:14:52Z

@elasticmachine run elasticsearch-ci/1

Tim-Brooks · 2019-08-08T18:34:23Z

@elasticmachine run elasticsearch-ci/bwc

Tim-Brooks · 2019-08-08T19:33:47Z

@elasticmachine run elasticsearch-ci/packaging-sample

Elasticsearch does not grant Netty reflection access to get Unsafe. The only mechanism that currently exists to free direct buffers in a timely manner is to use Unsafe. This leads to the occasional scenario, under heavy network load, that direct byte buffers can slowly build up without being freed. This commit disables Netty direct buffer pooling and moves to a strategy of using a single thread-local direct buffer for interfacing with sockets. This will reduce the memory usage from networking. Elasticsearch currently derives very little value from direct buffer usage (TLS, compression, Lucene, Elasticsearch handling, etc all use heap bytes). So this seems like the correct trade-off until that changes.

Pooled direct Netty buffers can destabilize ES easily starting in v7.4.0 due to the move to Netty 4.1.38 which uses direct buffers for all IO allocations by default. If you don't have direct buffer pooling disabled you won't run the new IO path from elastic#44837 and large messages (e.g. huge bulk requests) will cause allocation of unpooled direct byte buffers of the size of the message and will often lead to the cluster slowly going OOM. People who upgrade their ES cluster but keep their jvm.options file won't have the now default `-Dio.netty.allocator.numDirectArenas=0` set and might run into memory trouble.

occho · 2019-11-16T13:33:47Z

modules/transport-netty4/src/main/java/org/elasticsearch/transport/CopyBytesSocketChannel.java

+        int bytesRead = javaChannel().read(ioBuffer);
+        ioBuffer.flip();
+        if (bytesRead > 0) {
+            byteBuf.writeBytes(ioBuffer);


I'm thinking this line possibly expand byteBuf capacity. That behavior is different from NioSocketChannel, which tries to ensure the expansion does not happen.
Is this acceptable? What can be risks of it?

https://github.com/netty/netty/blob/netty-4.1.38.Final/transport/src/main/java/io/netty/channel/socket/nio/NioSocketChannel.java#L347

We experienced an issue after upgrading ES cluster from ES 7.1.1 to 7.4.2. When a bunch of index creation, like hundreds of indices, happens, master/data nodes start to leave from a cluster, and the cluster status becomes red. We are still investigating what is happening actually, and hopefully we can submit a report about the issue.

When we first ran ES 7.4.2, jvm.options does not include -Dio.netty.allocator.numDirectArenas=0 line. After we observe an OOM issue, we added the flag. After that, the OOM issue does not happen. But, looks like the issue on index creation started.

@occho since our ioBuffer is of fixed size and we are not using any read throttling at the moment, I believe this change did not introduce a new risk in terms of memory allocation. If anything, the change should make the memory use of the networking layer more less unpredictable. As you experienced yourself, setting -Dio.netty.allocator.numDirectArenas=0 fixed any OOM issues.

We are still investigating what is happening actually, and hopefully we can submit a report about the issue.

Thanks for looking into that. If you feel like you don't have enough details to open a Github issue around a reproducible bug. Feel free to share what you have (logs etc.) on our discuss forums for discussion.

Understood. Thank you for clarifying that!

Tim-Brooks added 4 commits July 24, 2019 12:32

WIP

c9074a1

Changes

a65f505

Changes

1b88d1a

Changes

88fdf90

Tim-Brooks added :Distributed Coordination/Network Http and internode communication implementations team-discuss v8.0.0 v7.4.0 labels Jul 24, 2019

Tim-Brooks requested review from jasontedor and s1monw July 24, 2019 22:36

Checkstyle

fbb565e

Tim-Brooks added the WIP label Jul 24, 2019

Tim-Brooks added 3 commits July 24, 2019 17:03

Changes

8aace5c

Add netty license

090e89d

Fix an initialization issue?

a84aff1

Tim-Brooks added 11 commits July 24, 2019 17:55

Checkstyle

e599682

Single allocator

2666b42

Initialization issues

0545981

Merge remote-tracking branch 'upstream/master' into disable_direct_po…

50f0f39

…oling

Static

cd6a0c5

Fix

780c489

Merge remote-tracking branch 'upstream/master' into disable_direct_po…

55cad27

…oling

Changes

ff4579b

Changes

e436405

Changes

b62a954

Change

43817b5

Tim-Brooks removed the WIP label Jul 31, 2019

Tim-Brooks added >non-issue and removed team-discuss labels Jul 31, 2019

Tim-Brooks added 2 commits July 31, 2019 17:38

Merge remote-tracking branch 'upstream/master' into disable_direct_po…

394e801

…oling

Merge remote-tracking branch 'upstream/master' into disable_direct_po…

fd8556f

…oling

original-brownbear mentioned this pull request Aug 4, 2019

Upgrade to Netty 4.1.38 #45132

Merged

original-brownbear reviewed Aug 5, 2019

View reviewed changes

modules/transport-netty4/src/main/java/org/elasticsearch/transport/CopyBytesSocketChannel.java Show resolved Hide resolved

s1monw reviewed Aug 6, 2019

View reviewed changes

original-brownbear reviewed Aug 8, 2019

View reviewed changes

Tim-Brooks added 3 commits August 8, 2019 09:22

Merge remote-tracking branch 'upstream/master' into disable_direct_po…

a4f9e11

…oling

Changes

1f40a04

Checkstyle

bee8dc1

Tim-Brooks requested a review from original-brownbear August 8, 2019 16:54

original-brownbear approved these changes Aug 8, 2019

View reviewed changes

Merge remote-tracking branch 'upstream/master' into disable_direct_po…

4888867

…oling

Tim-Brooks merged commit e0f9d61 into elastic:master Aug 8, 2019

Tim-Brooks added the backport pending label Aug 8, 2019

droberts195 removed the backport pending label Aug 12, 2019

droberts195 mentioned this pull request Aug 12, 2019

[CI] ForecastIT.testOverflowToDisk is timing out the test suite #45405

Closed

original-brownbear mentioned this pull request Oct 9, 2019

Deprecate Pooled Direct Netty Buffers #47782

Closed

occho reviewed Nov 16, 2019

View reviewed changes

Tim-Brooks deleted the disable_direct_pooling branch December 18, 2019 14:54

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Disable netty direct buffer pooling by default #44837

Disable netty direct buffer pooling by default #44837

Uh oh!

Conversation

Tim-Brooks commented Jul 24, 2019

Uh oh!

elasticmachine commented Jul 24, 2019

Uh oh!

Tim-Brooks commented Jul 24, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Tim-Brooks commented Jul 24, 2019

Uh oh!

Tim-Brooks commented Jul 31, 2019

Uh oh!

Uh oh!

s1monw left a comment

Choose a reason for hiding this comment

Uh oh!

original-brownbear left a comment

Choose a reason for hiding this comment

Uh oh!

original-brownbear Aug 8, 2019

Choose a reason for hiding this comment

Uh oh!

Tim-Brooks Aug 8, 2019

Choose a reason for hiding this comment

Uh oh!

original-brownbear Aug 8, 2019

Choose a reason for hiding this comment

Uh oh!

original-brownbear Aug 8, 2019

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

original-brownbear left a comment

Choose a reason for hiding this comment

Uh oh!

Tim-Brooks commented Aug 8, 2019

Uh oh!

Tim-Brooks commented Aug 8, 2019

Uh oh!

Tim-Brooks commented Aug 8, 2019

Uh oh!

Tim-Brooks commented Aug 8, 2019

Uh oh!

occho Nov 16, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

original-brownbear Nov 16, 2019

Choose a reason for hiding this comment

Uh oh!

occho Nov 21, 2019

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Tim-Brooks commented Jul 24, 2019 •

edited

Loading

occho Nov 16, 2019 •

edited

Loading