feat(server): Garbage collector thread for project cache eviction [INGEST-1355] #1410

jjbayer · 2022-08-11T15:11:29Z

Spawn a background thread in ProjectCache that takes care of dropping Project objects, to shorten cache eviction times.

Switch from std::collections::HashMap to hashbrown::HashMap to take advantage of drain_filter.

relay-server/src/utils/garbage.rs

relay-server/src/actors/project_cache.rs

flub · 2022-08-12T11:52:17Z

relay-server/src/utils/garbage.rs

+        let join_handle = std::thread::spawn(move || {
+            relay_log::debug!("Start garbage collection thread");
+            while let Ok(object) = rx.recv() {
+                // TODO: Log size of channel queue as a gauge here


afaik the only way to do this is by having your own AtomicU32 to keep track of the number of entries inside it. The other metrics option is to count on both ends, then you'll have to rates which you can subtract and no need to handle an atomic.

question is if you need this at all. the risk you are trying to mitigate is a disposal thread that doesn't keep up and the channel growing indefinitely. Slowing down the channel is not helpful with that, and neither is having a metric: it will only allow you to figure out eventually that there's a metric growing together with the memory.

The way to really mitigate is to make a bounded channel: if the channel is full you can log the error to sentry and you start dropping in the sender thread again.

Sorry I somehow missed this comment and added the metric anyway. The bounded channel idea makes sense to me, let's see what values the metric shows and then follow up if necessary.

flub · 2022-08-12T11:55:46Z

I don't really understand what joining this thread buys us? It seems a lot of complexity for no gain.

jjbayer · 2022-08-12T13:22:36Z

I don't really understand what joining this thread buys us? It seems a lot of complexity for no gain.

The benefit is that I can join on it in the unit test to verify its behavior :)
But I agree about the complexity... For a long running service, it should not matter. Maybe I'll revert that change after all.

…al-2

flub

It might be worth adding the bounded queue as a follow up task to do once you have some metrics on the size of it.

flub

what happened about the custom sampling of this metric? did you write down the motivation to get rid of it somewhere?

flub · 2022-08-18T08:34:49Z

relay-server/src/actors/project_cache.rs

+        }
+
+        // Log garbage queue size:
+        let queue_size = self.garbage_disposal.queue_size() as f64;


this alone was enough reason for me to report queue size as u64 rather than usize :)

Can you clarify? You could equally have the queue_size as usize here and then cast this to an f64.

Wait, why is this f64 and not u64? Only just noticed that and now I'm confused.

Metrics are reported as u64.

We don't really know what sizes a channel can get up to as this is not described AFAIK, but we can probably assume we'd crash before we reach the max size.

That means deciding to expose it as a usize is arbitrary as exposing it as a u64.

But now here you are lead to believe that the true value is represented by a usize and you're supposedly casting here. This is misleading.

flub · 2022-08-18T08:35:46Z

relay-server/src/statsd.rs

    fn name(&self) -> &'static str {
        match self {
            RelayGauges::NetworkOutage => "upstream.network_outage",
+            RelayGauges::ProjectCacheGarbageQueueSize => "project_cache.garbage.queue_size",


if all this was to get rid of the nonsensical tag, I'd have rather gone with passing a string for the tag into the constructor. but whatever, either will do.

The main motivation was to get rid of the i % 100 condition for logging. The utility class now does not concern itself with statsd, but simply offers a method to query its queue size, and it's up to the caller when and how to log that information, which I think is a better separation of concerns.

(note i approved so i'm not asking for changes)
The metrics are either needed or not, it shouldn't be up to the caller to think of hooking up the metrics, we're not building a library but an application (and even then, but that's a larger topic).

CHANGELOG.md

Co-authored-by: Iker Barriocanal <[email protected]>

feat(server): Garbage collector thread for project cache eviction

317145f

flub approved these changes Aug 11, 2022

View reviewed changes

relay-server/src/utils/garbage.rs Outdated Show resolved Hide resolved

jan-auer reviewed Aug 12, 2022

View reviewed changes

relay-server/src/utils/garbage.rs Show resolved Hide resolved

relay-server/src/utils/garbage.rs Show resolved Hide resolved

relay-server/src/actors/project_cache.rs Outdated Show resolved Hide resolved

jjbayer added 3 commits August 12, 2022 10:40

Join on drop, rewrite test

4a7a16e

ref: Use hashbrown::HashMap::drain_filter

7e6820b

doc: Changelog

142c5c9

flub reviewed Aug 12, 2022

View reviewed changes

jjbayer added 3 commits August 17, 2022 12:18

Merge remote-tracking branch 'origin/master' into feat/garbage-dispos…

72c5936

…al-2

ref: Only join in tests

c3a452a

instr: Add metric for queue size

ec3140b

jjbayer marked this pull request as ready for review August 18, 2022 06:53

jjbayer requested review from a team and flub August 18, 2022 06:53

flub approved these changes Aug 18, 2022

View reviewed changes

Move queue size logging out of class

8f1e417

flub approved these changes Aug 18, 2022

View reviewed changes

jan-auer approved these changes Aug 18, 2022

View reviewed changes

iker-barriocanal approved these changes Aug 18, 2022

View reviewed changes

CHANGELOG.md Outdated Show resolved Hide resolved

Update CHANGELOG.md

e6f53f5

Co-authored-by: Iker Barriocanal <[email protected]>

jjbayer merged commit dee2494 into master Aug 18, 2022

jjbayer deleted the feat/garbage-disposal-2 branch August 18, 2022 09:24

feat(server): Garbage collector thread for project cache eviction [INGEST-1355] #1410

feat(server): Garbage collector thread for project cache eviction [INGEST-1355] #1410

Uh oh!

Conversation

jjbayer commented Aug 11, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

flub commented Aug 12, 2022

Uh oh!

jjbayer commented Aug 12, 2022

Uh oh!

flub left a comment

Choose a reason for hiding this comment

Uh oh!

flub left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

jjbayer commented Aug 11, 2022 •

edited

Loading