-
Notifications
You must be signed in to change notification settings - Fork 104
feat(server): Garbage collector thread for project cache eviction [INGEST-1355] #1410
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
relay-server/src/utils/garbage.rs
Outdated
| let join_handle = std::thread::spawn(move || { | ||
| relay_log::debug!("Start garbage collection thread"); | ||
| while let Ok(object) = rx.recv() { | ||
| // TODO: Log size of channel queue as a gauge here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
afaik the only way to do this is by having your own AtomicU32 to keep track of the number of entries inside it. The other metrics option is to count on both ends, then you'll have to rates which you can subtract and no need to handle an atomic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question is if you need this at all. the risk you are trying to mitigate is a disposal thread that doesn't keep up and the channel growing indefinitely. Slowing down the channel is not helpful with that, and neither is having a metric: it will only allow you to figure out eventually that there's a metric growing together with the memory.
The way to really mitigate is to make a bounded channel: if the channel is full you can log the error to sentry and you start dropping in the sender thread again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I somehow missed this comment and added the metric anyway. The bounded channel idea makes sense to me, let's see what values the metric shows and then follow up if necessary.
|
I don't really understand what joining this thread buys us? It seems a lot of complexity for no gain. |
The benefit is that I can join on it in the unit test to verify its behavior :) |
flub
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be worth adding the bounded queue as a follow up task to do once you have some metrics on the size of it.
flub
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what happened about the custom sampling of this metric? did you write down the motivation to get rid of it somewhere?
| } | ||
|
|
||
| // Log garbage queue size: | ||
| let queue_size = self.garbage_disposal.queue_size() as f64; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this alone was enough reason for me to report queue size as u64 rather than usize :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you clarify? You could equally have the queue_size as usize here and then cast this to an f64.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait, why is this f64 and not u64? Only just noticed that and now I'm confused.
- Metrics are reported as
u64. - We don't really know what sizes a channel can get up to as this is not described AFAIK, but we can probably assume we'd crash before we reach the max size.
- That means deciding to expose it as a
usizeis arbitrary as exposing it as au64. - But now here you are lead to believe that the true value is represented by a
usizeand you're supposedly casting here. This is misleading.
| fn name(&self) -> &'static str { | ||
| match self { | ||
| RelayGauges::NetworkOutage => "upstream.network_outage", | ||
| RelayGauges::ProjectCacheGarbageQueueSize => "project_cache.garbage.queue_size", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if all this was to get rid of the nonsensical tag, I'd have rather gone with passing a string for the tag into the constructor. but whatever, either will do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main motivation was to get rid of the i % 100 condition for logging. The utility class now does not concern itself with statsd, but simply offers a method to query its queue size, and it's up to the caller when and how to log that information, which I think is a better separation of concerns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(note i approved so i'm not asking for changes)
The metrics are either needed or not, it shouldn't be up to the caller to think of hooking up the metrics, we're not building a library but an application (and even then, but that's a larger topic).
Co-authored-by: Iker Barriocanal <[email protected]>
Spawn a background thread in
ProjectCachethat takes care of droppingProjectobjects, to shorten cache eviction times.Switch from
std::collections::HashMaptohashbrown::HashMapto take advantage ofdrain_filter.