Skip to content

Conversation

@henningandersen
Copy link
Contributor

Outbound responses would not get the expected decRef, resulting in
memory and/or circuit breaker leaks. In particular, the
GetCcrRestoreFileChunkResponse expects this, causing a leak when
a follower bootstraps.

Relates #65921

Outbound responses would not get the expected `decRef`, resulting in
memory and/or circuit breaker leaks. In particular, the
`GetCcrRestoreFileChunkResponse` expects this, causing a leak when
a follower bootstraps.

Relates elastic#65921
@henningandersen henningandersen added >bug :Distributed Coordination/Network Http and internode communication implementations :Distributed Indexing/CCR Issues around the Cross Cluster State Replication features v8.0.0 v7.14.1 v7.15.0 labels Aug 12, 2021
@elasticmachine elasticmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Aug 12, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

Copy link
Contributor

@original-brownbear original-brownbear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for spotting this Henning!

unfortunately, the tests are failing now. There must be more to this than just a failure in the general case as this would definitely fail existing tests if it would leak all the time.
We must fail to invoke a listener/decrement in some exceptional case as now we're failing tests with:

    com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=50, name=elasticsearch[leader1][transport_worker][T#2], state=RUNNABLE, group=TGRP-RestartIndexFollowingIT]

        Caused by:
        java.lang.AssertionError
            at __randomizedtesting.SeedInfo.seed([3BDF36A18D3FAC5D]:0)
            at org.elasticsearch.core.AbstractRefCounted.decRef(AbstractRefCounted.java:52)
            at org.elasticsearch.common.bytes.ReleasableBytesReference.close(ReleasableBytesReference.java:90)
            at org.elasticsearch.transport.nio.MockNioTransport$MockTcpReadWriteHandler.consumeReads(MockNioTransport.java:314)
            at org.elasticsearch.nio.SocketChannelContext.handleReadBytes(SocketChannelContext.java:217)
            at org.elasticsearch.nio.BytesChannelContext.read(BytesChannelContext.java:29)
            at org.elasticsearch.nio.EventHandler.handleRead(EventHandler.java:128)
            at org.elasticsearch.transport.nio.TestEventHandler.handleRead(TestEventHandler.java:140)
            at org.elasticsearch.nio.NioSelector.handleRead(NioSelector.java:409)
            at org.elasticsearch.nio.NioSelector.processKey(NioSelector.java:235)
            at org.elasticsearch.nio.NioSelector.singleLoop(NioSelector.java:163)
            at org.elasticsearch.nio.NioSelector.runLoop(NioSelector.java:120)
            at java.base/java.lang.Thread.run(Thread.java:834)

Copy link
Contributor

@original-brownbear original-brownbear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM after discussing on another channel this looks just fine :)

@henningandersen
Copy link
Contributor Author

henningandersen commented Aug 13, 2021

Let us do one more round of randomized test:
@elasticmachine test this please

@henningandersen
Copy link
Contributor Author

@elasticmachine test this please

@henningandersen henningandersen merged commit b138d60 into elastic:master Aug 14, 2021
@henningandersen henningandersen added auto-backport Automatically create backport pull requests when merged and removed auto-backport Automatically create backport pull requests when merged labels Aug 14, 2021
henningandersen added a commit to henningandersen/elasticsearch that referenced this pull request Aug 14, 2021
Outbound responses would not get the expected `decRef`, resulting in
memory and/or circuit breaker leaks. In particular, the
`GetCcrRestoreFileChunkResponse` expects this, causing a leak when
a follower bootstraps.

Relates elastic#65921
henningandersen added a commit that referenced this pull request Aug 15, 2021
Outbound responses would not get the expected `decRef`, resulting in
memory and/or circuit breaker leaks. In particular, the
`GetCcrRestoreFileChunkResponse` expects this, causing a leak when
a follower bootstraps.

Relates #65921
henningandersen added a commit that referenced this pull request Aug 15, 2021
Outbound responses would not get the expected `decRef`, resulting in
memory and/or circuit breaker leaks. In particular, the
`GetCcrRestoreFileChunkResponse` expects this, causing a leak when
a follower bootstraps.

Relates #65921
tlrx added a commit that referenced this pull request Nov 4, 2022
…1289)

In #76474 we fixed a circuit breaker leak in TransportActionProxy 
by incrementing a reference on the TransportResponse that is later 
decremented by the OutboundHandler.

This works well for all cases except when the request targets the 
node which is also the proxy node. In that case the reference is 
incremented but will never be decremented as the local execution 
(using TransportService#localNodeConnection and 
DirectResponseChannel) bypasses the OutboundHandler.

This change fixes the ref counting by also decrementing the 
TransportResponse in DirectResponseChannel.

This will also have the consequence to correctly decrement used 
bytes of the request circuit breaker when 
GetCcrRestoreFileChunkResponse are executed on a node that 
is also a proxy node.
tlrx added a commit to tlrx/elasticsearch that referenced this pull request Nov 4, 2022
…astic#91289)

In elastic#76474 we fixed a circuit breaker leak in TransportActionProxy 
by incrementing a reference on the TransportResponse that is later 
decremented by the OutboundHandler.

This works well for all cases except when the request targets the 
node which is also the proxy node. In that case the reference is 
incremented but will never be decremented as the local execution 
(using TransportService#localNodeConnection and 
DirectResponseChannel) bypasses the OutboundHandler.

This change fixes the ref counting by also decrementing the 
TransportResponse in DirectResponseChannel.

This will also have the consequence to correctly decrement used 
bytes of the request circuit breaker when 
GetCcrRestoreFileChunkResponse are executed on a node that 
is also a proxy node.
elasticsearchmachine pushed a commit that referenced this pull request Nov 4, 2022
…1289) (#91315)

In #76474 we fixed a circuit breaker leak in TransportActionProxy 
by incrementing a reference on the TransportResponse that is later 
decremented by the OutboundHandler.

This works well for all cases except when the request targets the 
node which is also the proxy node. In that case the reference is 
incremented but will never be decremented as the local execution 
(using TransportService#localNodeConnection and 
DirectResponseChannel) bypasses the OutboundHandler.

This change fixes the ref counting by also decrementing the 
TransportResponse in DirectResponseChannel.

This will also have the consequence to correctly decrement used 
bytes of the request circuit breaker when 
GetCcrRestoreFileChunkResponse are executed on a node that 
is also a proxy node.
lchqlchq pushed a commit to lchqlchq/elasticsearch that referenced this pull request Aug 23, 2023
lchqlchq pushed a commit to lchqlchq/elasticsearch that referenced this pull request Aug 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>bug :Distributed Coordination/Network Http and internode communication implementations :Distributed Indexing/CCR Issues around the Cross Cluster State Replication features release highlight Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. v7.14.1 v7.15.0 v8.0.0-alpha2

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants