Skip to content

Conversation

@colings86
Copy link
Contributor

@colings86 colings86 commented Aug 23, 2016

The nodes usage API has 2 main endpoints

/_nodes/usage and /_nodes/{nodeIds}/usage return the usage statistics
for all nodes and the specified node(s) respectively.

/_nodes/usage/_clear and _nodes/{nodeIds}/usage/_clear clear the usage
statistics for all node and the specified node(s) respectively.

At the moment only one type of usage statistics is available, the REST
actions usage. This records the number of times each REST action class is
called and when the nodes usage api is called will return a map of rest
action class name to long representing the number of times each of the action
classes has been called.

In following PRs I want to add usage statistics for the query types and
aggregation types and this PR leaves open the ability to add other usage
statistics and filter which stats are returned.

Still to do:

  • Documentation

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think using a LongAdder would probably be more efficient than AtomicLong, since the value is not going to be inspected as often as its incremented

@jasontedor
Copy link
Member

@colings86 Can you explain the reasoning behind adding a _clear endpoint? That is something that we've resisted calls for in the past (e.g., #9693).

@rjernst
Copy link
Member

rjernst commented Aug 24, 2016

What is the difference between node usage and node stats?

@colings86
Copy link
Contributor Author

@rjernst Node stats to me is about system level statistics like JVM statistics and internal ES statistics whereas node usage is intended to be more about how the user facing features of ES are being used. I think they should be in separate APIs since I think they will be used at different times, with node stats being constantly polled by a monitoring system and node usage being used less frequently to determine how often different features are being used.

@jasontedor I was on the fence about adding the _clear endpoint but I thought I could be useful and thought we actually did have it in other places. I would be happy to remove it from this PR although the tests will then become tricky. The tests had to be REST tests because the statistics are gathered at the REST layer and therefore aren't updated if the transport client is used (as in ESIntegTestCase) but in REST tests the cluster persists between tests so without a _clear endpoint it is hard to validate what the REST usage statistics should be.

@tlrx
Copy link
Member

tlrx commented Aug 24, 2016

What is the difference between node usage and node stats?
Node stats to me is about system level statistics like JVM statistics and internal ES statistics whereas node usage is intended to be more about how the user facing features of ES are being used.

I understand your point @colings86 but I feel like it should go in the existing Node Stats as a metric called actions (not polled by default). We could also have them aggregated for almost free in cluster stats.

@clintongormley
Copy link
Contributor

I'm on the fence on this one. I think the main reason for wanting to add this to node stats is to avoid code duplication (although the abstractions that we have means that not that much needs to be duplicated).

The intention of this API to gather stats about which features are being used and how frequently, so that us devs can gather feedback about which features are important and which are unused. The output from the API could become quite large. Also, it's not the type of info that would normally be monitored frequently (unlike node stats).

I wouldn't want to return this big blob of (usually uninteresting from a monitoring pov) data on every nodes stats request, but nodes stats returns all stats by default. We deliberately changed the nodes stats API from returning a selection of stats to returning all because the API was confusing, so I wouldn't like to go back to the way it was.

I'm leaning more towards a separate API.

@jasontedor
Copy link
Member

I was on the fence about adding the _clear endpoint but I thought I could be useful and thought we actually did have it in other places.

Without a compelling reason, my preference would be to leave it out.

I would be happy to remove it from this PR although the tests will then become tricky. The tests had to be REST tests because the statistics are gathered at the REST layer and therefore aren't updated if the transport client is used (as in ESIntegTestCase)

It concerns me that we would not have unit tests for this, and I think we should strive to find a way. I'm happy to help brainstorm on this. In the worst case though, couldn't we use an integration test with the HTTP client?

@rjernst
Copy link
Member

rjernst commented Aug 24, 2016

I think the main reason for wanting to add this to node stats is to avoid code duplication (although the abstractions that we have means that not that much needs to be duplicated).

Code duplication is not the only reason (and I don't think it is a trival amount of code, you have to think about the numerous classes (request, response, transport, etc) that must be created for any new API in elasticsearch. But my main concern is about deciding when to put something where. Without a very precise, strict, bifurcated ruleset, I think there will be confusion when adding any new stats/usage (especially because the description in this PR defines the latter as "usage stats").

I wouldn't want to return this big blob of (usually uninteresting from a monitoring pov) data on every nodes stats request, but nodes stats returns all stats by default.

We deliberately changed the nodes stats API from returning a selection of stats to returning all because the API was confusing

Why does it do that? That means we must be careful about adding any new stats, for fear of "slowing down" monitoring. I think monitoring should be explicit about the sections of stats which it wants to monitor. How is that confusing?

@clintongormley
Copy link
Contributor

It's confusing for the REST user who constantly has to refer to the documentation to find out which stats they can ask for, as opposed to just getting all stats back then narrowing the request down to the ones you really want.

Without a very precise, strict, bifurcated ruleset, I think there will be confusion when adding any new stats/usage (especially because the description in this PR defines the latter as "usage stats").

The usage stats answers the question "what features are being used on this cluster?" as opposed to node stats which answers the question "how is this node performing under what load?"

The nodes usage API has 2 main endpoints

`/_nodes/usage` and `/_nodes/{nodeIds}/usage` return the usage statistics
for all nodes and the specified node(s) respectively.

`/_nodes/usage/_clear` and `_nodes/{nodeIds}/usage/_clear clear the usage
statistics for all node and the specified node(s) respectively.

At the moment only one type of usage statistics is available, the rest
actions usage. This records the number of times each rest action class is
called and when the nodes usage api is called will return a map of rest
action class name to long representing the number of times each of the action
classes has been called.

In following PRs I want to add usage statistics for the query types and
aggregation types and this PR leaves open the ability to add other usage
statistics and filter which stats are returned.

Still to do:
* Documentation
@colings86
Copy link
Contributor Author

@rjernst @clintongormley I'm going to pick this up again although given how much things have changed in the time since I raised this PR I'm going to start from scratch rather than resolve the conflicts on this branch. Given that I wondered if we could reach a resolution on whether the usage information is accessed through a new endpoint or if its returned as part of the existing node stats endpoint?

@rjernst
Copy link
Member

rjernst commented Apr 18, 2017

@colings86 At this point I'm fine with a separate api. However, I think it should not use "stats" in the name it all. Can you just call this "feature usage"?

@colings86
Copy link
Contributor Author

@rjernst Sure, I agree that using stats in the name (or in any of the documentation) for this would be confusing

@colings86
Copy link
Contributor Author

Closing in favour of #24169

@colings86 colings86 closed this May 26, 2017
@colings86 colings86 deleted the feature/usageAPI-REST branch June 21, 2017 08:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Core/Infra/REST API REST infrastructure and utilities >feature v5.4.1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants