-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Fix memory/breaker leaks for outbound responses #76474
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Outbound responses would not get the expected `decRef`, resulting in memory and/or circuit breaker leaks. In particular, the `GetCcrRestoreFileChunkResponse` expects this, causing a leak when a follower bootstraps. Relates elastic#65921
|
Pinging @elastic/es-distributed (Team:Distributed) |
original-brownbear
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for spotting this Henning!
unfortunately, the tests are failing now. There must be more to this than just a failure in the general case as this would definitely fail existing tests if it would leak all the time.
We must fail to invoke a listener/decrement in some exceptional case as now we're failing tests with:
com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=50, name=elasticsearch[leader1][transport_worker][T#2], state=RUNNABLE, group=TGRP-RestartIndexFollowingIT]
Caused by:
java.lang.AssertionError
at __randomizedtesting.SeedInfo.seed([3BDF36A18D3FAC5D]:0)
at org.elasticsearch.core.AbstractRefCounted.decRef(AbstractRefCounted.java:52)
at org.elasticsearch.common.bytes.ReleasableBytesReference.close(ReleasableBytesReference.java:90)
at org.elasticsearch.transport.nio.MockNioTransport$MockTcpReadWriteHandler.consumeReads(MockNioTransport.java:314)
at org.elasticsearch.nio.SocketChannelContext.handleReadBytes(SocketChannelContext.java:217)
at org.elasticsearch.nio.BytesChannelContext.read(BytesChannelContext.java:29)
at org.elasticsearch.nio.EventHandler.handleRead(EventHandler.java:128)
at org.elasticsearch.transport.nio.TestEventHandler.handleRead(TestEventHandler.java:140)
at org.elasticsearch.nio.NioSelector.handleRead(NioSelector.java:409)
at org.elasticsearch.nio.NioSelector.processKey(NioSelector.java:235)
at org.elasticsearch.nio.NioSelector.singleLoop(NioSelector.java:163)
at org.elasticsearch.nio.NioSelector.runLoop(NioSelector.java:120)
at java.base/java.lang.Thread.run(Thread.java:834)
original-brownbear
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM after discussing on another channel this looks just fine :)
|
Let us do one more round of randomized test: |
|
@elasticmachine test this please |
Outbound responses would not get the expected `decRef`, resulting in memory and/or circuit breaker leaks. In particular, the `GetCcrRestoreFileChunkResponse` expects this, causing a leak when a follower bootstraps. Relates elastic#65921
Outbound responses would not get the expected `decRef`, resulting in memory and/or circuit breaker leaks. In particular, the `GetCcrRestoreFileChunkResponse` expects this, causing a leak when a follower bootstraps. Relates #65921
Outbound responses would not get the expected `decRef`, resulting in memory and/or circuit breaker leaks. In particular, the `GetCcrRestoreFileChunkResponse` expects this, causing a leak when a follower bootstraps. Relates #65921
…1289) In #76474 we fixed a circuit breaker leak in TransportActionProxy by incrementing a reference on the TransportResponse that is later decremented by the OutboundHandler. This works well for all cases except when the request targets the node which is also the proxy node. In that case the reference is incremented but will never be decremented as the local execution (using TransportService#localNodeConnection and DirectResponseChannel) bypasses the OutboundHandler. This change fixes the ref counting by also decrementing the TransportResponse in DirectResponseChannel. This will also have the consequence to correctly decrement used bytes of the request circuit breaker when GetCcrRestoreFileChunkResponse are executed on a node that is also a proxy node.
…astic#91289) In elastic#76474 we fixed a circuit breaker leak in TransportActionProxy by incrementing a reference on the TransportResponse that is later decremented by the OutboundHandler. This works well for all cases except when the request targets the node which is also the proxy node. In that case the reference is incremented but will never be decremented as the local execution (using TransportService#localNodeConnection and DirectResponseChannel) bypasses the OutboundHandler. This change fixes the ref counting by also decrementing the TransportResponse in DirectResponseChannel. This will also have the consequence to correctly decrement used bytes of the request circuit breaker when GetCcrRestoreFileChunkResponse are executed on a node that is also a proxy node.
…1289) (#91315) In #76474 we fixed a circuit breaker leak in TransportActionProxy by incrementing a reference on the TransportResponse that is later decremented by the OutboundHandler. This works well for all cases except when the request targets the node which is also the proxy node. In that case the reference is incremented but will never be decremented as the local execution (using TransportService#localNodeConnection and DirectResponseChannel) bypasses the OutboundHandler. This change fixes the ref counting by also decrementing the TransportResponse in DirectResponseChannel. This will also have the consequence to correctly decrement used bytes of the request circuit breaker when GetCcrRestoreFileChunkResponse are executed on a node that is also a proxy node.
…elastic#76536" This reverts commit 411e5e7
Outbound responses would not get the expected
decRef, resulting inmemory and/or circuit breaker leaks. In particular, the
GetCcrRestoreFileChunkResponseexpects this, causing a leak whena follower bootstraps.
Relates #65921