Skip to content

Conversation

@Tim-Brooks
Copy link
Contributor

Elasticsearch does not grant Netty reflection access to get Unsafe. The
only mechanism that currently exists to free direct buffers in a timely
manner is to use Unsafe. This leads to the occasional scenario, under
heavy network load, that direct byte buffers can slowly build up without
being freed.

This commit disables Netty direct buffer pooling and moves to a strategy
of using a single thread-local direct buffer for interfacing with sockets.
This will reduce the memory usage from networking. Elasticsearch
currently derives very little value from direct buffer usage (TLS,
compression, Lucene, Elasticsearch handling, etc all use heap bytes). So
this seems like the correct trade-off until that changes.

@Tim-Brooks Tim-Brooks added :Distributed Coordination/Network Http and internode communication implementations team-discuss v8.0.0 v7.4.0 labels Jul 24, 2019
@Tim-Brooks Tim-Brooks requested review from jasontedor and s1monw July 24, 2019 22:36
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@Tim-Brooks
Copy link
Contributor Author

Tim-Brooks commented Jul 24, 2019

Every Netty channel has a buffer for outgoing writes. Immediately prior to putting outgoing messages in this buffer netty attempts to wrap heap bytes (always the case for ES) into direct bytes. Unfortunately due to friction between ES and netty both the heap bytes and direct bytes are retained until the flush is complete so this increases memory usage. Additionally having direct buffer pooling enabled == 16 MB direct chunks per thread. This can scale to gigabytes under load.

We have remediated the OOM issue with rate limiting writes (for the internal transport) and setting JVM options to force GC collections at certain points of direct memory usage. But, IMO I don't really see a significant value for having this direct memory pooling enabled when we are currently so heap bytes oriented. The strategy in this PR (1 MB IO buffer per thread + copying) is the strategy we use for transport-nio. And we see that transport-nio performs similarly (or slightly better) on our nightlys. I think that a strategy that significantly reduces memory usage in exchange for the possibility of more copying is the correct strategy.

I had attempted to open a PR (netty/netty#9183) with netty that would allow us to set our own Unsafe cleaner so that netty buffer pooling could be used. However, it looks like Netty is not going to accept the PR. The most recent suggestions are 1. we give them access to Unsafe and they will create a code path to only use it for cleaning and 2. We completely rebuild the memory allocators. I assume that 1 is not going to get much support and 2 is really a very significant commitment when I don't think we need that currently.

@Tim-Brooks Tim-Brooks added the WIP label Jul 24, 2019
@Tim-Brooks
Copy link
Contributor Author

Also I added team-discuss so we will discuss this in a distributed meeting. But I tagged Jason and Simon so that you two are aware that I would like to consider this.

@Tim-Brooks Tim-Brooks removed the WIP label Jul 31, 2019
@Tim-Brooks
Copy link
Contributor Author

This is ready for review.

Copy link
Contributor

@s1monw s1monw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I think this makes a lot of sense to me.

Copy link
Contributor

@original-brownbear original-brownbear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks just fine overall, thanks @tbrooks8! I added a few comments on testing, docs and minor optimizations.

systemProperty 'es.set.netty.runtime.available.processors', 'false'

// Disable direct buffer pooling as it is disabled by default in Elasticsearch
systemProperty 'io.netty.allocator.numDirectArenas', '0'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we randomise this for tests via the test seed maybe since this has non-trivial effects on what code actually executes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not clear how to accomplish this. I spoke to @mark-vieira and he indicated there is not a great way to randomize based on a seed in a gradle file. I can apply the system property in ESTestCase, but the system properties are set prior to the random utils being available.

I also am not clear the extent to which I should invest time in this? Our expectation I think is that these system properties should not really be messed with unless you really know what you are doing. But if you have some straightforward way in mind to improve coverage, let me know.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do this for Azure:

https://github.com/elastic/elasticsearch/blob/master/plugins/repository-azure/qa/microsoft-azure-storage/build.gradle#L85

so maybe just something like:

Long.parseUnsignedLong(project.rootProject.testSeed.tokenize(':').get(0), 16)) %2 == 0

to get a 50:50 split boolean? @mark-vieira any reason not to do this? (I think do exactly this in multiple places already)

systemProperty 'es.set.netty.runtime.available.processors', 'false'

// Disable direct buffer pooling as it is disabled by default in Elasticsearch
systemProperty 'io.netty.allocator.numDirectArenas', '0'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, maybe randomise here?

Copy link
Contributor

@original-brownbear original-brownbear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thanks @tbrooks8 !

Would be interesting to hear what @mark-vieira thinks about our test seed hack in the build files, but even if he's fine with it I think we can just add that in a follow-up -> no need to delay things here :)

@Tim-Brooks
Copy link
Contributor Author

Would be interesting to hear what @mark-vieira thinks about our test seed hack in the build files, but even if he's fine with it I think we can just add that in a follow-up -> no need to delay things here :)

I'll open a follow-up. It probably also makes sense to randomize pooled/unpooled since we set those to different values based on JVM ergonomics. So we can have a PR dedicated to just randomizing the system properties.

@Tim-Brooks
Copy link
Contributor Author

@elasticmachine run elasticsearch-ci/1

@Tim-Brooks
Copy link
Contributor Author

@elasticmachine run elasticsearch-ci/bwc

@Tim-Brooks
Copy link
Contributor Author

@elasticmachine run elasticsearch-ci/packaging-sample

@Tim-Brooks Tim-Brooks merged commit e0f9d61 into elastic:master Aug 8, 2019
Tim-Brooks added a commit that referenced this pull request Aug 8, 2019
Elasticsearch does not grant Netty reflection access to get Unsafe. The
only mechanism that currently exists to free direct buffers in a timely
manner is to use Unsafe. This leads to the occasional scenario, under
heavy network load, that direct byte buffers can slowly build up without
being freed.

This commit disables Netty direct buffer pooling and moves to a strategy
of using a single thread-local direct buffer for interfacing with sockets.
This will reduce the memory usage from networking. Elasticsearch
currently derives very little value from direct buffer usage (TLS,
compression, Lucene, Elasticsearch handling, etc all use heap bytes). So
this seems like the correct trade-off until that changes.
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this pull request Oct 9, 2019
Pooled direct Netty buffers can destabilize
ES easily starting in v7.4.0 due to the move
to Netty 4.1.38 which uses direct buffers for
all IO allocations by default.
If you don't have direct buffer pooling disabled
you won't run the new IO path from elastic#44837 and
large messages (e.g. huge bulk requests) will cause
allocation of unpooled direct byte buffers of the
size of the message and will often lead to the
cluster slowly going OOM.

People who upgrade their ES cluster but keep their
jvm.options file won't have the now default `-Dio.netty.allocator.numDirectArenas=0`
set and might run into memory trouble.
int bytesRead = javaChannel().read(ioBuffer);
ioBuffer.flip();
if (bytesRead > 0) {
byteBuf.writeBytes(ioBuffer);
Copy link

@occho occho Nov 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking this line possibly expand byteBuf capacity. That behavior is different from NioSocketChannel, which tries to ensure the expansion does not happen.
Is this acceptable? What can be risks of it?

https://github.com/netty/netty/blob/netty-4.1.38.Final/transport/src/main/java/io/netty/channel/socket/nio/NioSocketChannel.java#L347

We experienced an issue after upgrading ES cluster from ES 7.1.1 to 7.4.2. When a bunch of index creation, like hundreds of indices, happens, master/data nodes start to leave from a cluster, and the cluster status becomes red. We are still investigating what is happening actually, and hopefully we can submit a report about the issue.

When we first ran ES 7.4.2, jvm.options does not include -Dio.netty.allocator.numDirectArenas=0 line. After we observe an OOM issue, we added the flag. After that, the OOM issue does not happen. But, looks like the issue on index creation started.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@occho since our ioBuffer is of fixed size and we are not using any read throttling at the moment, I believe this change did not introduce a new risk in terms of memory allocation. If anything, the change should make the memory use of the networking layer more less unpredictable. As you experienced yourself, setting -Dio.netty.allocator.numDirectArenas=0 fixed any OOM issues.

We are still investigating what is happening actually, and hopefully we can submit a report about the issue.

Thanks for looking into that. If you feel like you don't have enough details to open a Github issue around a reproducible bug. Feel free to share what you have (logs etc.) on our discuss forums for discussion.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood. Thank you for clarifying that!

@Tim-Brooks Tim-Brooks deleted the disable_direct_pooling branch December 18, 2019 14:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants