-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Adds nodes usage API to monitor usages of actions #20124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think using a LongAdder would probably be more efficient than AtomicLong, since the value is not going to be inspected as often as its incremented
|
@colings86 Can you explain the reasoning behind adding a |
|
What is the difference between node usage and node stats? |
|
@rjernst Node stats to me is about system level statistics like JVM statistics and internal ES statistics whereas node usage is intended to be more about how the user facing features of ES are being used. I think they should be in separate APIs since I think they will be used at different times, with node stats being constantly polled by a monitoring system and node usage being used less frequently to determine how often different features are being used. @jasontedor I was on the fence about adding the |
I understand your point @colings86 but I feel like it should go in the existing Node Stats as a metric called |
|
I'm on the fence on this one. I think the main reason for wanting to add this to node stats is to avoid code duplication (although the abstractions that we have means that not that much needs to be duplicated). The intention of this API to gather stats about which features are being used and how frequently, so that us devs can gather feedback about which features are important and which are unused. The output from the API could become quite large. Also, it's not the type of info that would normally be monitored frequently (unlike node stats). I wouldn't want to return this big blob of (usually uninteresting from a monitoring pov) data on every nodes stats request, but nodes stats returns all stats by default. We deliberately changed the nodes stats API from returning a selection of stats to returning all because the API was confusing, so I wouldn't like to go back to the way it was. I'm leaning more towards a separate API. |
Without a compelling reason, my preference would be to leave it out.
It concerns me that we would not have unit tests for this, and I think we should strive to find a way. I'm happy to help brainstorm on this. In the worst case though, couldn't we use an integration test with the HTTP client? |
Code duplication is not the only reason (and I don't think it is a trival amount of code, you have to think about the numerous classes (request, response, transport, etc) that must be created for any new API in elasticsearch. But my main concern is about deciding when to put something where. Without a very precise, strict, bifurcated ruleset, I think there will be confusion when adding any new stats/usage (especially because the description in this PR defines the latter as "usage stats").
Why does it do that? That means we must be careful about adding any new stats, for fear of "slowing down" monitoring. I think monitoring should be explicit about the sections of stats which it wants to monitor. How is that confusing? |
|
It's confusing for the REST user who constantly has to refer to the documentation to find out which stats they can ask for, as opposed to just getting all stats back then narrowing the request down to the ones you really want.
The usage stats answers the question "what features are being used on this cluster?" as opposed to node stats which answers the question "how is this node performing under what load?" |
The nodes usage API has 2 main endpoints
`/_nodes/usage` and `/_nodes/{nodeIds}/usage` return the usage statistics
for all nodes and the specified node(s) respectively.
`/_nodes/usage/_clear` and `_nodes/{nodeIds}/usage/_clear clear the usage
statistics for all node and the specified node(s) respectively.
At the moment only one type of usage statistics is available, the rest
actions usage. This records the number of times each rest action class is
called and when the nodes usage api is called will return a map of rest
action class name to long representing the number of times each of the action
classes has been called.
In following PRs I want to add usage statistics for the query types and
aggregation types and this PR leaves open the ability to add other usage
statistics and filter which stats are returned.
Still to do:
* Documentation
|
@rjernst @clintongormley I'm going to pick this up again although given how much things have changed in the time since I raised this PR I'm going to start from scratch rather than resolve the conflicts on this branch. Given that I wondered if we could reach a resolution on whether the usage information is accessed through a new endpoint or if its returned as part of the existing node stats endpoint? |
|
@colings86 At this point I'm fine with a separate api. However, I think it should not use "stats" in the name it all. Can you just call this "feature usage"? |
|
@rjernst Sure, I agree that using stats in the name (or in any of the documentation) for this would be confusing |
|
Closing in favour of #24169 |
The nodes usage API has 2 main endpoints
/_nodes/usageand/_nodes/{nodeIds}/usagereturn the usage statisticsfor all nodes and the specified node(s) respectively.
/_nodes/usage/_clearand_nodes/{nodeIds}/usage/_clearclear the usagestatistics for all node and the specified node(s) respectively.
At the moment only one type of usage statistics is available, the REST
actions usage. This records the number of times each REST action class is
called and when the nodes usage api is called will return a map of rest
action class name to long representing the number of times each of the action
classes has been called.
In following PRs I want to add usage statistics for the query types and
aggregation types and this PR leaves open the ability to add other usage
statistics and filter which stats are returned.
Still to do: