Skip to content

fix(memory, docker): support for reading cgroup data #8519

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

bigcat88
Copy link
Contributor

Alternative PR #8511

The logic is the same, just rewritten the code to be shorter and uses a cache for total available memory which as far as I know can't be changed without restarting the container, so it can only be safely read once.

Also changed that it doesn't apply to mps since MPS can be shared to containers and we don't want this new code running on macos.

This is a small optimization to check the cache value and if it's undefined, not read the cgroup data.

limit = _cgroup_limit_bytes()
used = _cgroup_used_bytes() if limit is not None else None

If someone can test this on Kubernetus, that would be great.

@bigcat88 bigcat88 requested a review from comfyanonymous as a code owner June 13, 2025 12:21
@bigcat88 bigcat88 force-pushed the fix/memory-cgroup branch from dd592b5 to 673c635 Compare June 13, 2025 12:28
@asagi4
Copy link
Contributor

asagi4 commented Jun 14, 2025

Does it make any sense to cache these reads explicitly? sysfs is an in-memory filesystem, so reading from it doesn't even cause any IO. I doubt you need to complicate the code with caching.

@bigcat88
Copy link
Contributor Author

bigcat88 commented Jun 14, 2025

Does it make any sense to cache these reads explicitly? sysfs is an in-memory filesystem, so reading from it doesn't even cause any IO.

Only one value is cached. It helps to determine should we call psutil.virtual_memory() in the get_free_memory.

It is still a system call to a kernel space, afaik. Ideally this code just need to be moved to a separate file with helpers functions and not be present in the model_management.py file.

When Python can improve compilation into byte code (they have been honestly trying to do this properly for 3 years now), it will be much more noticeable.

Current results from small script:

root@14a3e4294591:/app# python3 test.py 
--- CGroup Read vs. LRU Cache ---
This version handles 'max' as a special value (999999).
Memory limit is 'max'. Proceeding with benchmark.
Performing 2,000,000 calls each.

Direct Read : 5149.76 ns per call
Cached Read : 17.01 ns per call

Cached version is 302.8x faster.

@asagi4
Copy link
Contributor

asagi4 commented Jun 14, 2025

I mean, sure it's faster but how many times is the value read? Is it really code where 5 microseconds makes a difference? I haven't measured, so maybe it is, but I wouldn't be surprised if importing the lrucache module costs more time than that caching can ever save.

@bigcat88
Copy link
Contributor Author

bigcat88 commented Jun 15, 2025

In some ways you are right, instead of lru_cache we can simply use a global variable, and the code will be smaller.

Thank you for insisting, this way the code really looks nicer.

tested this PR on RunPOD(cgroup1), now the correctly value is displayed for RAM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants