-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Adding indexing pressure stats to node stats API #59247
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Pinging @elastic/es-distributed (:Distributed/CRUD) |
WriteMemoryLimits
ywelsch
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Tim. I've left some smaller comments, but largely looking good.
server/src/main/java/org/elasticsearch/index/stats/IndexingPressureStats.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/stats/IndexingPressureStats.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/IndexingPressure.java
Outdated
Show resolved
Hide resolved
|
|
||
| import java.io.IOException; | ||
|
|
||
| public class IndexingPressureStats implements Writeable, ToXContentFragment { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You've suggested in Slack doing different buckets (coordinating vs. primary) in metrics, which I think is a good idea, as it will make it clearer that some nodes might only be doing coordination, while others might not be client-facing and therefore doing purely primary (+ replica) work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I explored this on Thursday and ran into some issues.
Essentially it is reasonably straightforward to mark the bytes are coordinating at the TransportBulkAction level. But here is the problem I ran into. When a node receives the TransportShardBulkAction transport message we mark the bytes. And as we discussed we would mark the bytes as primary in the updated version. However, we use the rerouteWasLocal marker to know whether we need to mark the bytes at the primaryAction level. In this new version we would need to mark the bytes if the rerouteWasLocal but on the TransportBulkAction node. But not on the node that receives the TransportShardBulkAction message. And I am not sure how to delineate there.
We could mark coordinating bytes when TransportShardBulkAction is received. And then mark bytes at the primary level. We can mark a delta when the rerouteWasLocal is true. And use that delta to avoid double accounted rejections. That just means that a primary node will always mark both coordinating and primary bytes. Technically the reroute component can be coordinating work. But it feels kind of weird.
Let's discuss tomorrow and see if we still want to pursue this breakdown.
ywelsch
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks Tim!
We have recently added internal metrics to monitor the amount of indexing occurring on a node. These metrics introduce back pressure to indexing when memory utilization is too high. This commit exposes these stats through the node stats API.
We have recently added internal metrics to monitor the amount of
indexing occurring on a node. These metrics introduce back pressure to
indexing when memory utilization is too high. This commit exposes these
stats through the node stats API.