fix(memory, docker): support for reading cgroup data #8519

bigcat88 · 2025-06-13T12:21:25Z

Alternative PR #8511

The logic is the same, just rewritten the code to be shorter and uses a cache for total available memory which as far as I know can't be changed without restarting the container, so it can only be safely read once.

Also changed that it doesn't apply to mps since MPS can be shared to containers and we don't want this new code running on macos.

This is a small optimization to check the cache value and if it's undefined, not read the cgroup data.

limit = _cgroup_limit_bytes()
used = _cgroup_used_bytes() if limit is not None else None

If someone can test this on Kubernetus, that would be great.

Signed-off-by: bigcat88 <[email protected]>

asagi4 · 2025-06-14T07:52:52Z

Does it make any sense to cache these reads explicitly? sysfs is an in-memory filesystem, so reading from it doesn't even cause any IO. I doubt you need to complicate the code with caching.

bigcat88 · 2025-06-14T11:20:39Z

Does it make any sense to cache these reads explicitly? sysfs is an in-memory filesystem, so reading from it doesn't even cause any IO.

Only one value is cached. It helps to determine should we call psutil.virtual_memory() in the get_free_memory.

It is still a system call to a kernel space, afaik. Ideally this code just need to be moved to a separate file with helpers functions and not be present in the model_management.py file.

When Python can improve compilation into byte code (they have been honestly trying to do this properly for 3 years now), it will be much more noticeable.

Current results from small script:

root@14a3e4294591:/app# python3 test.py 
--- CGroup Read vs. LRU Cache ---
This version handles 'max' as a special value (999999).
Memory limit is 'max'. Proceeding with benchmark.
Performing 2,000,000 calls each.

Direct Read : 5149.76 ns per call
Cached Read : 17.01 ns per call

Cached version is 302.8x faster.

asagi4 · 2025-06-14T13:38:56Z

I mean, sure it's faster but how many times is the value read? Is it really code where 5 microseconds makes a difference? I haven't measured, so maybe it is, but I wouldn't be surprised if importing the lrucache module costs more time than that caching can ever save.

Signed-off-by: bigcat88 <[email protected]>

bigcat88 · 2025-06-15T05:35:28Z

In some ways you are right, instead of lru_cache we can simply use a global variable, and the code will be smaller.

Thank you for insisting, this way the code really looks nicer.

tested this PR on RunPOD(cgroup1), now the correctly value is displayed for RAM

bigcat88 requested a review from comfyanonymous as a code owner June 13, 2025 12:21

fix(memory, docker): support for reading cgroup data

673c635

Signed-off-by: bigcat88 <[email protected]>

bigcat88 force-pushed the fix/memory-cgroup branch from dd592b5 to 673c635 Compare June 13, 2025 12:28

bigcat88 added 2 commits June 15, 2025 08:07

removed use of the lru_cache

290cfd1

Signed-off-by: bigcat88 <[email protected]>

use cgroup in "total_ram" report

7eebd59

Signed-off-by: bigcat88 <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(memory, docker): support for reading cgroup data #8519

fix(memory, docker): support for reading cgroup data #8519

Uh oh!

bigcat88 commented Jun 13, 2025

Uh oh!

asagi4 commented Jun 14, 2025

Uh oh!

bigcat88 commented Jun 14, 2025 •

edited

Loading

Uh oh!

asagi4 commented Jun 14, 2025

Uh oh!

bigcat88 commented Jun 15, 2025 •

edited

Loading

Uh oh!

Uh oh!

fix(memory, docker): support for reading cgroup data #8519

Are you sure you want to change the base?

fix(memory, docker): support for reading cgroup data #8519

Uh oh!

Conversation

bigcat88 commented Jun 13, 2025

Uh oh!

asagi4 commented Jun 14, 2025

Uh oh!

bigcat88 commented Jun 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

asagi4 commented Jun 14, 2025

Uh oh!

bigcat88 commented Jun 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

bigcat88 commented Jun 14, 2025 •

edited

Loading

bigcat88 commented Jun 15, 2025 •

edited

Loading