Skip to content

Commit c37b69e

Browse files
committed
Allow for customizing process identification
1 parent 1c8e407 commit c37b69e

File tree

3 files changed

+101
-51
lines changed

3 files changed

+101
-51
lines changed

README.md

Lines changed: 59 additions & 46 deletions
Original file line numberDiff line numberDiff line change
@@ -187,17 +187,17 @@ summary_value['count'] # => 100
187187
All metrics can have labels, allowing grouping of related time series.
188188

189189
Labels are an extremely powerful feature, but one that must be used with care.
190-
Refer to the best practices on [naming](https://prometheus.io/docs/practices/naming/) and
190+
Refer to the best practices on [naming](https://prometheus.io/docs/practices/naming/) and
191191
[labels](https://prometheus.io/docs/practices/instrumentation/#use-labels).
192192

193-
Most importantly, avoid labels that can have a large number of possible values (high
193+
Most importantly, avoid labels that can have a large number of possible values (high
194194
cardinality). For example, an HTTP Status Code is a good label. A User ID is **not**.
195195

196196
Labels are specified optionally when updating metrics, as a hash of `label_name => value`.
197-
Refer to [the Prometheus documentation](https://prometheus.io/docs/concepts/data_model/#metric-names-and-labels)
197+
Refer to [the Prometheus documentation](https://prometheus.io/docs/concepts/data_model/#metric-names-and-labels)
198198
as to what's a valid `label_name`.
199199

200-
In order for a metric to accept labels, their names must be specified when first initializing
200+
In order for a metric to accept labels, their names must be specified when first initializing
201201
the metric. Then, when the metric is updated, all the specified labels must be present.
202202

203203
Example:
@@ -215,8 +215,8 @@ You can also "pre-set" some of these label values, if they'll always be the same
215215
need to specify them every time:
216216

217217
```ruby
218-
https_requests_total = Counter.new(:http_requests_total,
219-
docstring: '...',
218+
https_requests_total = Counter.new(:http_requests_total,
219+
docstring: '...',
220220
labels: [:service, :status_code],
221221
preset_labels: { service: "my_service" })
222222

@@ -231,7 +231,7 @@ with a subset (or full set) of labels set, so that you can increment / observe t
231231
without having to specify the labels for every call.
232232

233233
Moreover, if all the labels the metric can take have been pre-set, validation of the labels
234-
is done on the call to `with_labels`, and then skipped for each observation, which can
234+
is done on the call to `with_labels`, and then skipped for each observation, which can
235235
lead to performance improvements. If you are incrementing a counter in a fast loop, you
236236
definitely want to be doing this.
237237

@@ -242,8 +242,8 @@ Examples:
242242

243243
```ruby
244244
# in the metric definition:
245-
records_processed_total = registry.counter.new(:records_processed_total,
246-
docstring: '...',
245+
records_processed_total = registry.counter.new(:records_processed_total,
246+
docstring: '...',
247247
labels: [:service, :component],
248248
preset_labels: { service: "my_service" })
249249

@@ -256,11 +256,11 @@ class MyComponent
256256
def metric
257257
@metric ||= records_processed_total.with_labels(component: "my_component")
258258
end
259-
259+
260260
def process
261261
records.each do |record|
262262
# process the record
263-
metric.increment
263+
metric.increment
264264
end
265265
end
266266
end
@@ -280,11 +280,11 @@ metric definition will result in a
280280

281281
- `:job`
282282
- `:instance`
283-
- `:pid`
283+
- `:pid` (unless you define a new `ProcessIdentity`)
284284

285285
## Data Stores
286286

287-
The data for all the metrics (the internal counters associated with each labelset)
287+
The data for all the metrics (the internal counters associated with each labelset)
288288
is stored in a global Data Store object, rather than in the metric objects themselves.
289289
(This "storage" is ephemeral, generally in-memory, it's not "long-term storage")
290290

@@ -294,12 +294,12 @@ example), require a shared store between all the processes, to be able to report
294294
numbers. At the same time, other applications may not have this requirement but be very
295295
sensitive to performance, and would prefer instead a simpler, faster store.
296296

297-
By having a standardized and simple interface that metrics use to access this store,
297+
By having a standardized and simple interface that metrics use to access this store,
298298
we abstract away the details of storing the data from the specific needs of each metric.
299-
This allows us to then simply swap around the stores based on the needs of different
300-
applications, with no changes to the rest of the client.
299+
This allows us to then simply swap around the stores based on the needs of different
300+
applications, with no changes to the rest of the client.
301301

302-
The client provides 3 built-in stores, but if neither of these is ideal for your
302+
The client provides 3 built-in stores, but if neither of these is ideal for your
303303
requirements, you can easily make your own store and use that instead. More on this below.
304304

305305
### Configuring which store to use.
@@ -317,7 +317,7 @@ NOTE: You **must** make sure to set the `data_store` before initializing any met
317317
If using Rails, you probably want to set up your Data Store on `config/application.rb`,
318318
or `config/environments/*`, both of which run before `config/initializers/*`
319319

320-
Also note that `config.data_store` is set to an *instance* of a `DataStore`, not to the
320+
Also note that `config.data_store` is set to an *instance* of a `DataStore`, not to the
321321
class. This is so that the stores can receive parameters. Most of the built-in stores
322322
don't require any, but `DirectFileStore` does, for example.
323323

@@ -336,45 +336,58 @@ documentation of each store for more details.
336336

337337
There are 3 built-in stores, with different trade-offs:
338338

339-
- **Synchronized**: Default store. Thread safe, but not suitable for multi-process
339+
- **Synchronized**: Default store. Thread safe, but not suitable for multi-process
340340
scenarios (e.g. pre-fork servers, like Unicorn). Stores data in Hashes, with all accesses
341-
protected by Mutexes.
341+
protected by Mutexes.
342+
342343
- **SingleThreaded**: Fastest store, but only suitable for single-threaded scenarios.
343-
This store does not make any effort to synchronize access to its internal hashes, so
344+
This store does not make any effort to synchronize access to its internal hashes, so
344345
it's absolutely not thread safe.
346+
345347
- **DirectFileStore**: Stores data in binary files, one file per process and per metric.
346-
This is generally the recommended store to use with pre-fork servers and other
348+
This is generally the recommended store to use with pre-fork servers and other
347349
"multi-process" scenarios. There are some important caveats to using this store, so
348350
please read on the section below.
349351

352+
```ruby
353+
# process_identifier and generate_identity are optional
354+
DirectFileStore.new(dir: '/tmp/dfs', process_identifier: :process_name, generate_identity: -> { $0 })
355+
```
356+
350357
### `DirectFileStore` caveats and things to keep in mind
351358

352359
Each metric gets a file for each process, and manages its contents by storing keys and
353-
binary floats next to them, and updating the offsets of those Floats directly. When
354-
exporting metrics, it will find all the files that apply to each metric, read them,
360+
binary floats next to them, and updating the offsets of those Floats directly. When
361+
exporting metrics, it will find all the files that apply to each metric, read them,
355362
and aggregate them.
356363

357364
**Aggregation of metrics**: Since there will be several files per metrics (one per process),
358365
these need to be aggregated to present a coherent view to Prometheus. Depending on your
359-
use case, you may need to control how this works. When using this store,
366+
use case, you may need to control how this works. When using this store,
360367
each Metric allows you to specify an `:aggregation` setting, defining how
361368
to aggregate the multiple possible values we can get for each labelset. By default,
362369
Counters, Histograms and Summaries are `SUM`med, and Gauges report all their values (one
363-
for each process), tagged with a `pid` label. You can also select `SUM`, `MAX`, `MIN`, or
364-
`MOST_RECENT` for your gauges, depending on your use case.
370+
for each process), tagged with a `pid` label by default. You can also select `SUM`, `MAX`,
371+
`MIN`, or `MOST_RECENT` for your gauges, depending on your use case.
365372

366373
Please note that that the `MOST_RECENT` aggregation only works for gauges, and it does not
367-
allow the use of `increment` / `decrement`, you can only use `set`.
374+
allow the use of `increment` / `decrement`, you can only use `set`.
375+
376+
**Process Identity**: When defining the `DirectFileStore`, you may change how processes are
377+
identified. When the `process_identifier` and `generate_identity` arguments are specified,
378+
then the default `pid` will no longer be applied. This can be done to capture the process
379+
name (`$0`), the puma worker's index, or other identifying attributes. `generate_identity`
380+
is expected to implement `call()`.
368381

369382
**Memory Usage**: When scraped by Prometheus, this store will read all these files, get all
370383
the values and aggregate them. We have notice this can have a noticeable effect on memory
371384
usage for your app. We recommend you test this in a realistic usage scenario to make sure
372385
you won't hit any memory limits your app may have.
373386

374-
**Resetting your metrics on each run**: You should also make sure that the directory where
375-
you store your metric files (specified when initializing the `DirectFileStore`) is emptied
376-
when your app starts. Otherwise, each app run will continue exporting the metrics from the
377-
previous run.
387+
**Resetting your metrics on each run**: You should also make sure that the directory where
388+
you store your metric files (specified when initializing the `DirectFileStore`) is emptied
389+
when your app starts. Otherwise, each app run will continue exporting the metrics from the
390+
previous run.
378391

379392
If you have this issue, one way to do this is to run code similar to this as part of you
380393
initialization:
@@ -389,15 +402,15 @@ If you are running in pre-fork servers (such as Unicorn or Puma with multiple pr
389402
make sure you do this **before** the server forks. Otherwise, each child process may delete
390403
files created by other processes on *this* run, instead of deleting old files.
391404

392-
**Large numbers of files**: Because there is an individual file per metric and per process
393-
(which is done to optimize for observation performance), you may end up with a large number
405+
**Large numbers of files**: Because there is an individual file per metric and per process
406+
(which is done to optimize for observation performance), you may end up with a large number
394407
of files. We don't currently have a solution for this problem, but we're working on it.
395408

396-
**Performance**: Even though this store saves data on disk, it's still much faster than
397-
would probably be expected, because the files are never actually `fsync`ed, so the store
398-
never blocks while waiting for disk. The kernel's page cache is incredibly efficient in
399-
this regard. If in doubt, check the benchmark scripts described in the documentation for
400-
creating your own stores and run them in your particular runtime environment to make sure
409+
**Performance**: Even though this store saves data on disk, it's still much faster than
410+
would probably be expected, because the files are never actually `fsync`ed, so the store
411+
never blocks while waiting for disk. The kernel's page cache is incredibly efficient in
412+
this regard. If in doubt, check the benchmark scripts described in the documentation for
413+
creating your own stores and run them in your particular runtime environment to make sure
401414
this provides adequate performance.
402415

403416

@@ -406,7 +419,7 @@ this provides adequate performance.
406419
If none of these stores is suitable for your requirements, you can easily make your own.
407420

408421
The interface and requirements of Stores are specified in detail in the `README.md`
409-
in the `client/data_stores` directory. This thoroughly documents how to make your own
422+
in the `client/data_stores` directory. This thoroughly documents how to make your own
410423
store.
411424

412425
There are also links there to non-built-in stores created by others that may be useful,
@@ -418,16 +431,16 @@ If you are in a multi-process environment (such as pre-fork servers like Unicorn
418431
process will probably keep their own counters, which need to be aggregated when receiving
419432
a Prometheus scrape, to report coherent total numbers.
420433

421-
For Counters, Histograms and quantile-less Summaries this is simply a matter of
434+
For Counters, Histograms and quantile-less Summaries this is simply a matter of
422435
summing the values of each process.
423436

424-
For Gauges, however, this may not be the right thing to do, depending on what they're
437+
For Gauges, however, this may not be the right thing to do, depending on what they're
425438
measuring. You might want to take the maximum or minimum value observed in any process,
426439
rather than the sum of all of them. By default, we export each process's individual
427440
value, with a `pid` label identifying each one.
428441

429-
If these defaults don't work for your use case, you should use the `store_settings`
430-
parameter when registering the metric, to specify an `:aggregation` setting.
442+
If these defaults don't work for your use case, you should use the `store_settings`
443+
parameter when registering the metric, to specify an `:aggregation` setting.
431444

432445
```ruby
433446
free_disk_space = registry.gauge(:free_disk_space_bytes,
@@ -438,8 +451,8 @@ free_disk_space = registry.gauge(:free_disk_space_bytes,
438451
NOTE: This will only work if the store you're using supports the `:aggregation` setting.
439452
Of the built-in stores, only `DirectFileStore` does.
440453

441-
Also note that the `:aggregation` setting works for all metric types, not just for gauges.
442-
It would be unusual to use it for anything other than gauges, but if your use-case
454+
Also note that the `:aggregation` setting works for all metric types, not just for gauges.
455+
It would be unusual to use it for anything other than gauges, but if your use-case
443456
requires it, the store will respect your aggregation wishes.
444457

445458
## Tests

lib/prometheus/client/data_stores/direct_file_store.rb

Lines changed: 21 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -33,8 +33,9 @@ class InvalidStoreSettingsError < StandardError; end
3333
DEFAULT_METRIC_SETTINGS = { aggregation: SUM }
3434
DEFAULT_GAUGE_SETTINGS = { aggregation: ALL }
3535

36-
def initialize(dir:)
36+
def initialize(dir:, process_identifier: :pid, generate_identity: -> { Process.pid })
3737
@store_settings = { dir: dir }
38+
@process_identity = ProcessIdentity.new(identifier_name: process_identifier, generator: generate_identity)
3839
FileUtils.mkdir_p(dir)
3940
end
4041

@@ -49,7 +50,8 @@ def for_metric(metric_name, metric_type:, metric_settings: {})
4950

5051
MetricStore.new(metric_name: metric_name,
5152
store_settings: @store_settings,
52-
metric_settings: settings)
53+
metric_settings: settings,
54+
process_identity: @process_identity)
5355
end
5456

5557
private
@@ -72,13 +74,27 @@ def validate_metric_settings(metric_type, metric_settings)
7274
end
7375
end
7476

77+
class ProcessIdentity
78+
def initialize(identifier_name:, generator:)
79+
raise unless generator.respond_to?(:call)
80+
81+
@identifier_name = identifier_name
82+
@generator = generator
83+
end
84+
85+
def insert_label!(labels)
86+
labels[@identifier_name] = @generator.call
87+
end
88+
end
89+
7590
class MetricStore
7691
attr_reader :metric_name, :store_settings
7792

78-
def initialize(metric_name:, store_settings:, metric_settings:)
93+
def initialize(metric_name:, store_settings:, metric_settings:, process_identity:)
7994
@metric_name = metric_name
8095
@store_settings = store_settings
8196
@values_aggregation_mode = metric_settings[:aggregation]
97+
@process_identity = process_identity
8298
@store_opened_by_pid = nil
8399

84100
@lock = Monitor.new
@@ -163,10 +179,11 @@ def in_process_sync
163179

164180
def store_key(labels)
165181
if @values_aggregation_mode == ALL
166-
labels[:pid] = process_id
182+
@process_identity.insert_label!(labels)
167183
end
168184

169185
labels.to_a.sort.map{|k,v| "#{CGI::escape(k.to_s)}=#{CGI::escape(v.to_s)}"}.join('&')
186+
170187
end
171188

172189
def internal_store

spec/prometheus/client/data_stores/direct_file_store_spec.rb

Lines changed: 21 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -192,6 +192,26 @@
192192
end
193193
end
194194

195+
context "gauge with custom process identifier" do
196+
subject { described_class.new(dir: "/tmp/prometheus_test", process_identifier: process_identifier, generate_identity: generate_identity) }
197+
198+
let(:process_identifier) { :process_name }
199+
let(:generate_identity) { -> { "rspec" } }
200+
let(:labels) { metric_store.all_values.keys.first }
201+
202+
it "does not use default key value" do
203+
metric_store = subject.for_metric(:metric_name, metric_type: :gauge)
204+
metric_store.set(labels: {}, val: 1)
205+
expect(labels).to_not include(:pid)
206+
end
207+
208+
it "includes process_name with rspec" do
209+
metric_store = subject.for_metric(:metric_name, metric_type: :gauge)
210+
metric_store.set(labels: {}, val: 1)
211+
expect(labels).to include(process_name: "rspec")
212+
end
213+
end
214+
195215
context "with a metric that takes MAX instead of SUM" do
196216
it "reports the maximum values from different processes" do
197217
allow(Process).to receive(:pid).and_return(12345)
@@ -347,7 +367,7 @@
347367
truncate_calls_count = 0
348368
allow_any_instance_of(Prometheus::Client::DataStores::DirectFileStore::FileMappedDict).
349369
to receive(:resize_file).and_wrap_original do |original_method, *args, &block|
350-
370+
351371
truncate_calls_count += 1
352372
original_method.call(*args, &block)
353373
end

0 commit comments

Comments
 (0)