Adds nodes usage API to monitor usages of actions #20124

colings86 · 2016-08-23T14:57:57Z

The nodes usage API has 2 main endpoints

/_nodes/usage and /_nodes/{nodeIds}/usage return the usage statistics
for all nodes and the specified node(s) respectively.

/_nodes/usage/_clear and _nodes/{nodeIds}/usage/_clear clear the usage
statistics for all node and the specified node(s) respectively.

At the moment only one type of usage statistics is available, the REST
actions usage. This records the number of times each REST action class is
called and when the nodes usage api is called will return a map of rest
action class name to long representing the number of times each of the action
classes has been called.

In following PRs I want to add usage statistics for the query types and
aggregation types and this PR leaves open the ability to add other usage
statistics and filter which stats are returned.

Still to do:

Documentation

dakrone · 2016-08-23T22:03:38Z

core/src/main/java/org/elasticsearch/usage/UsageService.java

I think using a LongAdder would probably be more efficient than AtomicLong, since the value is not going to be inspected as often as its incremented

jasontedor · 2016-08-24T01:46:07Z

@colings86 Can you explain the reasoning behind adding a _clear endpoint? That is something that we've resisted calls for in the past (e.g., #9693).

rjernst · 2016-08-24T02:35:10Z

What is the difference between node usage and node stats?

colings86 · 2016-08-24T08:09:07Z

@rjernst Node stats to me is about system level statistics like JVM statistics and internal ES statistics whereas node usage is intended to be more about how the user facing features of ES are being used. I think they should be in separate APIs since I think they will be used at different times, with node stats being constantly polled by a monitoring system and node usage being used less frequently to determine how often different features are being used.

@jasontedor I was on the fence about adding the _clear endpoint but I thought I could be useful and thought we actually did have it in other places. I would be happy to remove it from this PR although the tests will then become tricky. The tests had to be REST tests because the statistics are gathered at the REST layer and therefore aren't updated if the transport client is used (as in ESIntegTestCase) but in REST tests the cluster persists between tests so without a _clear endpoint it is hard to validate what the REST usage statistics should be.

tlrx · 2016-08-24T08:40:06Z

What is the difference between node usage and node stats?
Node stats to me is about system level statistics like JVM statistics and internal ES statistics whereas node usage is intended to be more about how the user facing features of ES are being used.

I understand your point @colings86 but I feel like it should go in the existing Node Stats as a metric called actions (not polled by default). We could also have them aggregated for almost free in cluster stats.

clintongormley · 2016-08-24T10:00:39Z

I'm on the fence on this one. I think the main reason for wanting to add this to node stats is to avoid code duplication (although the abstractions that we have means that not that much needs to be duplicated).

The intention of this API to gather stats about which features are being used and how frequently, so that us devs can gather feedback about which features are important and which are unused. The output from the API could become quite large. Also, it's not the type of info that would normally be monitored frequently (unlike node stats).

I wouldn't want to return this big blob of (usually uninteresting from a monitoring pov) data on every nodes stats request, but nodes stats returns all stats by default. We deliberately changed the nodes stats API from returning a selection of stats to returning all because the API was confusing, so I wouldn't like to go back to the way it was.

I'm leaning more towards a separate API.

jasontedor · 2016-08-24T14:16:09Z

I was on the fence about adding the _clear endpoint but I thought I could be useful and thought we actually did have it in other places.

Without a compelling reason, my preference would be to leave it out.

I would be happy to remove it from this PR although the tests will then become tricky. The tests had to be REST tests because the statistics are gathered at the REST layer and therefore aren't updated if the transport client is used (as in ESIntegTestCase)

It concerns me that we would not have unit tests for this, and I think we should strive to find a way. I'm happy to help brainstorm on this. In the worst case though, couldn't we use an integration test with the HTTP client?

rjernst · 2016-08-24T19:20:11Z

I think the main reason for wanting to add this to node stats is to avoid code duplication (although the abstractions that we have means that not that much needs to be duplicated).

Code duplication is not the only reason (and I don't think it is a trival amount of code, you have to think about the numerous classes (request, response, transport, etc) that must be created for any new API in elasticsearch. But my main concern is about deciding when to put something where. Without a very precise, strict, bifurcated ruleset, I think there will be confusion when adding any new stats/usage (especially because the description in this PR defines the latter as "usage stats").

I wouldn't want to return this big blob of (usually uninteresting from a monitoring pov) data on every nodes stats request, but nodes stats returns all stats by default.

We deliberately changed the nodes stats API from returning a selection of stats to returning all because the API was confusing

Why does it do that? That means we must be careful about adding any new stats, for fear of "slowing down" monitoring. I think monitoring should be explicit about the sections of stats which it wants to monitor. How is that confusing?

clintongormley · 2016-08-25T10:17:39Z

It's confusing for the REST user who constantly has to refer to the documentation to find out which stats they can ask for, as opposed to just getting all stats back then narrowing the request down to the ones you really want.

Without a very precise, strict, bifurcated ruleset, I think there will be confusion when adding any new stats/usage (especially because the description in this PR defines the latter as "usage stats").

The usage stats answers the question "what features are being used on this cluster?" as opposed to node stats which answers the question "how is this node performing under what load?"

The nodes usage API has 2 main endpoints `/_nodes/usage` and `/_nodes/{nodeIds}/usage` return the usage statistics for all nodes and the specified node(s) respectively. `/_nodes/usage/_clear` and `_nodes/{nodeIds}/usage/_clear clear the usage statistics for all node and the specified node(s) respectively. At the moment only one type of usage statistics is available, the rest actions usage. This records the number of times each rest action class is called and when the nodes usage api is called will return a map of rest action class name to long representing the number of times each of the action classes has been called. In following PRs I want to add usage statistics for the query types and aggregation types and this PR leaves open the ability to add other usage statistics and filter which stats are returned. Still to do: * Documentation

colings86 · 2017-04-18T15:56:11Z

@rjernst @clintongormley I'm going to pick this up again although given how much things have changed in the time since I raised this PR I'm going to start from scratch rather than resolve the conflicts on this branch. Given that I wondered if we could reach a resolution on whether the usage information is accessed through a new endpoint or if its returned as part of the existing node stats endpoint?

rjernst · 2017-04-18T17:32:53Z

@colings86 At this point I'm fine with a separate api. However, I think it should not use "stats" in the name it all. Can you just call this "feature usage"?

colings86 · 2017-04-18T17:34:48Z

@rjernst Sure, I agree that using stats in the name (or in any of the documentation) for this would be confusing

colings86 · 2017-05-26T14:50:56Z

Closing in favour of #24169

colings86 added >feature review :Core/Infra/REST API REST infrastructure and utilities v5.0.0-beta1 labels Aug 23, 2016

dakrone reviewed Aug 23, 2016
View reviewed changes

colings86 added 2 commits September 5, 2016 11:23

review comments

1854620

clintongormley added v5.0.0 and removed v5.0.0-beta1 labels Sep 14, 2016

clintongormley added v5.1.1 and removed v5.0.0 labels Oct 18, 2016

clintongormley added v5.2.0 and removed v5.1.1 labels Dec 7, 2016

clintongormley added v5.3.0 and removed v5.2.0 labels Jan 24, 2017

clintongormley added v5.4.0 and removed v5.3.0 labels Feb 7, 2017

clintongormley added v5.4.1 and removed v5.4.0 labels Apr 28, 2017

clintongormley added v5.4.2 and removed v5.4.1 labels May 15, 2017

colings86 closed this May 26, 2017

clintongormley added v5.4.1 and removed v5.4.2 labels May 29, 2017

colings86 deleted the feature/usageAPI-REST branch June 21, 2017 08:42

Adds nodes usage API to monitor usages of actions #20124

Adds nodes usage API to monitor usages of actions #20124

Uh oh!

Conversation

colings86 commented Aug 23, 2016 • edited by clintongormley Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dakrone Aug 23, 2016

Choose a reason for hiding this comment

Uh oh!

jasontedor commented Aug 24, 2016

Uh oh!

rjernst commented Aug 24, 2016

Uh oh!

colings86 commented Aug 24, 2016

Uh oh!

tlrx commented Aug 24, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

clintongormley commented Aug 24, 2016

Uh oh!

jasontedor commented Aug 24, 2016

Uh oh!

rjernst commented Aug 24, 2016

Uh oh!

clintongormley commented Aug 25, 2016

Uh oh!

colings86 commented Apr 18, 2017

Uh oh!

rjernst commented Apr 18, 2017

Uh oh!

colings86 commented Apr 18, 2017

Uh oh!

colings86 commented May 26, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

colings86 commented Aug 23, 2016 •

edited by clintongormley

Loading

tlrx commented Aug 24, 2016 •

edited

Loading