Skip to content

Conversation

@Tim-Brooks
Copy link
Contributor

@Tim-Brooks Tim-Brooks commented Jul 8, 2020

We have recently added internal metrics to monitor the amount of
indexing occurring on a node. These metrics introduce back pressure to
indexing when memory utilization is too high. This commit exposes these
stats through the node stats API.

@Tim-Brooks Tim-Brooks added >non-issue :Distributed Indexing/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. v8.0.0 v7.9.0 labels Jul 8, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (:Distributed/CRUD)

@elasticmachine elasticmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Jul 8, 2020
@Tim-Brooks Tim-Brooks changed the title Implement rejections in WriteMemoryLimits Adding indexing pressure stats to node stats API Jul 8, 2020
@Tim-Brooks Tim-Brooks requested a review from ywelsch July 9, 2020 04:09
Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Tim. I've left some smaller comments, but largely looking good.


import java.io.IOException;

public class IndexingPressureStats implements Writeable, ToXContentFragment {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You've suggested in Slack doing different buckets (coordinating vs. primary) in metrics, which I think is a good idea, as it will make it clearer that some nodes might only be doing coordination, while others might not be client-facing and therefore doing purely primary (+ replica) work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I explored this on Thursday and ran into some issues.

Essentially it is reasonably straightforward to mark the bytes are coordinating at the TransportBulkAction level. But here is the problem I ran into. When a node receives the TransportShardBulkAction transport message we mark the bytes. And as we discussed we would mark the bytes as primary in the updated version. However, we use the rerouteWasLocal marker to know whether we need to mark the bytes at the primaryAction level. In this new version we would need to mark the bytes if the rerouteWasLocal but on the TransportBulkAction node. But not on the node that receives the TransportShardBulkAction message. And I am not sure how to delineate there.

We could mark coordinating bytes when TransportShardBulkAction is received. And then mark bytes at the primary level. We can mark a delta when the rerouteWasLocal is true. And use that delta to avoid double accounted rejections. That just means that a primary node will always mark both coordinating and primary bytes. Technically the reroute component can be coordinating work. But it feels kind of weird.

Let's discuss tomorrow and see if we still want to pursue this breakdown.

@Tim-Brooks Tim-Brooks requested a review from ywelsch July 13, 2020 04:04
Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks Tim!

@Tim-Brooks Tim-Brooks merged commit b87bb86 into elastic:master Jul 13, 2020
Tim-Brooks added a commit to Tim-Brooks/elasticsearch that referenced this pull request Jul 13, 2020
We have recently added internal metrics to monitor the amount of
indexing occurring on a node. These metrics introduce back pressure to
indexing when memory utilization is too high. This commit exposes these
stats through the node stats API.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed Indexing/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. >non-issue Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. v7.9.0 v8.0.0-alpha1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants