Skip to content

Conversation

@original-brownbear
Copy link
Contributor

Pooled direct Netty buffers can destabilize
ES easily starting in v7.4.0 due to the move
to Netty 4.1.38 which uses direct buffers for
all IO allocations by default.
If you don't have direct buffer pooling disabled
you won't run the new IO path from #44837 and
large messages (e.g. huge bulk requests) will cause
allocation of unpooled direct byte buffers of the
size of the message and will often lead to the
cluster slowly going OOM.

People who upgrade their ES cluster but keep their old
jvm.options file won't have the now default -Dio.netty.allocator.numDirectArenas=0
set and might run into memory trouble.

See this discuss issue for example of bad behavior with direct buffers in 7.4: https://discuss.elastic.co/t/cluster-constantly-crashing-after-upgrade-to-7-4/202580

Pooled direct Netty buffers can destabilize
ES easily starting in v7.4.0 due to the move
to Netty 4.1.38 which uses direct buffers for
all IO allocations by default.
If you don't have direct buffer pooling disabled
you won't run the new IO path from elastic#44837 and
large messages (e.g. huge bulk requests) will cause
allocation of unpooled direct byte buffers of the
size of the message and will often lead to the
cluster slowly going OOM.

People who upgrade their ES cluster but keep their
jvm.options file won't have the now default `-Dio.netty.allocator.numDirectArenas=0`
set and might run into memory trouble.
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (:Distributed/Network)

Copy link
Contributor

@Tim-Brooks Tim-Brooks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - but lets have @ywelsch or @jasontedor confirm that this makes sense. It is kind of weird to deprecate a system property that we have never really had an official stance on using/not using. But I agree that some dangerous logging makes sense.

@ywelsch
Copy link
Contributor

ywelsch commented Oct 11, 2019

We could also put this back into JvmErgonomics, i.e. undo #46104, as well as keep it in the default jvm.options file. It feels odd to me to warn users about this but not enforce/default to it. The problem is that jvm.options mixes settings that we would like users to be able to change, while others they should preferably never touch (and require a larger amount of effort to change). Perhaps a topic for @elastic/es-core-infra?

@original-brownbear
Copy link
Contributor Author

@ywelsch

It feels odd to me to warn users about this but not enforce/default to it.

True. My suggestion would be to simply warn now because that's the least intrusive change and then simply remove the option of using this setting to go through the old IO path in 7.5 given how horribly the old path now performs? Then we don't need the property at all and can just handle this via code and setting up the allocator accordingly I think.

@ywelsch
Copy link
Contributor

ywelsch commented Oct 25, 2019

This is superseded by #47956, right?

@original-brownbear
Copy link
Contributor Author

Right :) -> closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants