Skip to content

Conversation

@jjbayer
Copy link
Member

@jjbayer jjbayer commented Aug 11, 2022

Spawn a background thread in ProjectCache that takes care of dropping Project objects, to shorten cache eviction times.

Switch from std::collections::HashMap to hashbrown::HashMap to take advantage of drain_filter.

let join_handle = std::thread::spawn(move || {
relay_log::debug!("Start garbage collection thread");
while let Ok(object) = rx.recv() {
// TODO: Log size of channel queue as a gauge here
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

afaik the only way to do this is by having your own AtomicU32 to keep track of the number of entries inside it. The other metrics option is to count on both ends, then you'll have to rates which you can subtract and no need to handle an atomic.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question is if you need this at all. the risk you are trying to mitigate is a disposal thread that doesn't keep up and the channel growing indefinitely. Slowing down the channel is not helpful with that, and neither is having a metric: it will only allow you to figure out eventually that there's a metric growing together with the memory.

The way to really mitigate is to make a bounded channel: if the channel is full you can log the error to sentry and you start dropping in the sender thread again.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I somehow missed this comment and added the metric anyway. The bounded channel idea makes sense to me, let's see what values the metric shows and then follow up if necessary.

@flub
Copy link
Contributor

flub commented Aug 12, 2022

I don't really understand what joining this thread buys us? It seems a lot of complexity for no gain.

@jjbayer
Copy link
Member Author

jjbayer commented Aug 12, 2022

I don't really understand what joining this thread buys us? It seems a lot of complexity for no gain.

The benefit is that I can join on it in the unit test to verify its behavior :)
But I agree about the complexity... For a long running service, it should not matter. Maybe I'll revert that change after all.

@jjbayer jjbayer marked this pull request as ready for review August 18, 2022 06:53
@jjbayer jjbayer requested review from a team and flub August 18, 2022 06:53
Copy link
Contributor

@flub flub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth adding the bounded queue as a follow up task to do once you have some metrics on the size of it.

Copy link
Contributor

@flub flub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happened about the custom sampling of this metric? did you write down the motivation to get rid of it somewhere?

}

// Log garbage queue size:
let queue_size = self.garbage_disposal.queue_size() as f64;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this alone was enough reason for me to report queue size as u64 rather than usize :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify? You could equally have the queue_size as usize here and then cast this to an f64.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, why is this f64 and not u64? Only just noticed that and now I'm confused.

  • Metrics are reported as u64.
  • We don't really know what sizes a channel can get up to as this is not described AFAIK, but we can probably assume we'd crash before we reach the max size.
  • That means deciding to expose it as a usize is arbitrary as exposing it as a u64.
  • But now here you are lead to believe that the true value is represented by a usize and you're supposedly casting here. This is misleading.

fn name(&self) -> &'static str {
match self {
RelayGauges::NetworkOutage => "upstream.network_outage",
RelayGauges::ProjectCacheGarbageQueueSize => "project_cache.garbage.queue_size",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if all this was to get rid of the nonsensical tag, I'd have rather gone with passing a string for the tag into the constructor. but whatever, either will do.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main motivation was to get rid of the i % 100 condition for logging. The utility class now does not concern itself with statsd, but simply offers a method to query its queue size, and it's up to the caller when and how to log that information, which I think is a better separation of concerns.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(note i approved so i'm not asking for changes)
The metrics are either needed or not, it shouldn't be up to the caller to think of hooking up the metrics, we're not building a library but an application (and even then, but that's a larger topic).

Co-authored-by: Iker Barriocanal <[email protected]>
@jjbayer jjbayer merged commit dee2494 into master Aug 18, 2022
@jjbayer jjbayer deleted the feat/garbage-disposal-2 branch August 18, 2022 09:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants