-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Disable netty direct buffer pooling by default #44837
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Pinging @elastic/es-distributed |
|
Every Netty channel has a buffer for outgoing writes. Immediately prior to putting outgoing messages in this buffer netty attempts to wrap heap bytes (always the case for ES) into direct bytes. Unfortunately due to friction between ES and netty both the heap bytes and direct bytes are retained until the flush is complete so this increases memory usage. Additionally having direct buffer pooling enabled == 16 MB direct chunks per thread. This can scale to gigabytes under load. We have remediated the OOM issue with rate limiting writes (for the internal transport) and setting JVM options to force GC collections at certain points of direct memory usage. But, IMO I don't really see a significant value for having this direct memory pooling enabled when we are currently so heap bytes oriented. The strategy in this PR (1 MB IO buffer per thread + copying) is the strategy we use for transport-nio. And we see that transport-nio performs similarly (or slightly better) on our nightlys. I think that a strategy that significantly reduces memory usage in exchange for the possibility of more copying is the correct strategy. I had attempted to open a PR (netty/netty#9183) with netty that would allow us to set our own Unsafe cleaner so that netty buffer pooling could be used. However, it looks like Netty is not going to accept the PR. The most recent suggestions are 1. we give them access to Unsafe and they will create a code path to only use it for cleaning and 2. We completely rebuild the memory allocators. I assume that 1 is not going to get much support and 2 is really a very significant commitment when I don't think we need that currently. |
|
Also I added team-discuss so we will discuss this in a distributed meeting. But I tagged Jason and Simon so that you two are aware that I would like to consider this. |
|
This is ready for review. |
modules/transport-netty4/src/main/java/org/elasticsearch/transport/CopyBytesSocketChannel.java
Show resolved
Hide resolved
s1monw
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW I think this makes a lot of sense to me.
original-brownbear
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks just fine overall, thanks @tbrooks8! I added a few comments on testing, docs and minor optimizations.
| systemProperty 'es.set.netty.runtime.available.processors', 'false' | ||
|
|
||
| // Disable direct buffer pooling as it is disabled by default in Elasticsearch | ||
| systemProperty 'io.netty.allocator.numDirectArenas', '0' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we randomise this for tests via the test seed maybe since this has non-trivial effects on what code actually executes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not clear how to accomplish this. I spoke to @mark-vieira and he indicated there is not a great way to randomize based on a seed in a gradle file. I can apply the system property in ESTestCase, but the system properties are set prior to the random utils being available.
I also am not clear the extent to which I should invest time in this? Our expectation I think is that these system properties should not really be messed with unless you really know what you are doing. But if you have some straightforward way in mind to improve coverage, let me know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do this for Azure:
so maybe just something like:
Long.parseUnsignedLong(project.rootProject.testSeed.tokenize(':').get(0), 16)) %2 == 0
to get a 50:50 split boolean? @mark-vieira any reason not to do this? (I think do exactly this in multiple places already)
| systemProperty 'es.set.netty.runtime.available.processors', 'false' | ||
|
|
||
| // Disable direct buffer pooling as it is disabled by default in Elasticsearch | ||
| systemProperty 'io.netty.allocator.numDirectArenas', '0' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, maybe randomise here?
modules/transport-netty4/src/main/java/org/elasticsearch/transport/CopyBytesSocketChannel.java
Show resolved
Hide resolved
modules/transport-netty4/src/main/java/org/elasticsearch/transport/CopyBytesSocketChannel.java
Outdated
Show resolved
Hide resolved
modules/transport-netty4/src/main/java/org/elasticsearch/transport/CopyBytesSocketChannel.java
Outdated
Show resolved
Hide resolved
modules/transport-netty4/src/main/java/org/elasticsearch/transport/CopyBytesSocketChannel.java
Show resolved
Hide resolved
modules/transport-netty4/src/main/java/org/elasticsearch/transport/CopyBytesSocketChannel.java
Show resolved
Hide resolved
.../transport-netty4/src/main/java/org/elasticsearch/http/netty4/Netty4HttpServerTransport.java
Show resolved
Hide resolved
original-brownbear
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM thanks @tbrooks8 !
Would be interesting to hear what @mark-vieira thinks about our test seed hack in the build files, but even if he's fine with it I think we can just add that in a follow-up -> no need to delay things here :)
I'll open a follow-up. It probably also makes sense to randomize pooled/unpooled since we set those to different values based on JVM ergonomics. So we can have a PR dedicated to just randomizing the system properties. |
|
@elasticmachine run elasticsearch-ci/1 |
|
@elasticmachine run elasticsearch-ci/bwc |
|
@elasticmachine run elasticsearch-ci/packaging-sample |
Elasticsearch does not grant Netty reflection access to get Unsafe. The only mechanism that currently exists to free direct buffers in a timely manner is to use Unsafe. This leads to the occasional scenario, under heavy network load, that direct byte buffers can slowly build up without being freed. This commit disables Netty direct buffer pooling and moves to a strategy of using a single thread-local direct buffer for interfacing with sockets. This will reduce the memory usage from networking. Elasticsearch currently derives very little value from direct buffer usage (TLS, compression, Lucene, Elasticsearch handling, etc all use heap bytes). So this seems like the correct trade-off until that changes.
Pooled direct Netty buffers can destabilize ES easily starting in v7.4.0 due to the move to Netty 4.1.38 which uses direct buffers for all IO allocations by default. If you don't have direct buffer pooling disabled you won't run the new IO path from elastic#44837 and large messages (e.g. huge bulk requests) will cause allocation of unpooled direct byte buffers of the size of the message and will often lead to the cluster slowly going OOM. People who upgrade their ES cluster but keep their jvm.options file won't have the now default `-Dio.netty.allocator.numDirectArenas=0` set and might run into memory trouble.
| int bytesRead = javaChannel().read(ioBuffer); | ||
| ioBuffer.flip(); | ||
| if (bytesRead > 0) { | ||
| byteBuf.writeBytes(ioBuffer); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking this line possibly expand byteBuf capacity. That behavior is different from NioSocketChannel, which tries to ensure the expansion does not happen.
Is this acceptable? What can be risks of it?
We experienced an issue after upgrading ES cluster from ES 7.1.1 to 7.4.2. When a bunch of index creation, like hundreds of indices, happens, master/data nodes start to leave from a cluster, and the cluster status becomes red. We are still investigating what is happening actually, and hopefully we can submit a report about the issue.
When we first ran ES 7.4.2, jvm.options does not include -Dio.netty.allocator.numDirectArenas=0 line. After we observe an OOM issue, we added the flag. After that, the OOM issue does not happen. But, looks like the issue on index creation started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@occho since our ioBuffer is of fixed size and we are not using any read throttling at the moment, I believe this change did not introduce a new risk in terms of memory allocation. If anything, the change should make the memory use of the networking layer more less unpredictable. As you experienced yourself, setting -Dio.netty.allocator.numDirectArenas=0 fixed any OOM issues.
We are still investigating what is happening actually, and hopefully we can submit a report about the issue.
Thanks for looking into that. If you feel like you don't have enough details to open a Github issue around a reproducible bug. Feel free to share what you have (logs etc.) on our discuss forums for discussion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood. Thank you for clarifying that!
Elasticsearch does not grant Netty reflection access to get Unsafe. The
only mechanism that currently exists to free direct buffers in a timely
manner is to use Unsafe. This leads to the occasional scenario, under
heavy network load, that direct byte buffers can slowly build up without
being freed.
This commit disables Netty direct buffer pooling and moves to a strategy
of using a single thread-local direct buffer for interfacing with sockets.
This will reduce the memory usage from networking. Elasticsearch
currently derives very little value from direct buffer usage (TLS,
compression, Lucene, Elasticsearch handling, etc all use heap bytes). So
this seems like the correct trade-off until that changes.