-
Notifications
You must be signed in to change notification settings - Fork 1.6k
KEP-3008: QoS-class resources #3004
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
KEP-3008: QoS-class resources #3004
Conversation
Skipping CI for Draft Pull Request. |
/cc @kad @haircommander |
Hi @marquiz All KEP PRs must have an open issue in k/enhancements (this repo). Please open an issue and fill it out completely and rename this PR to KEP-issue number but in the title of this PR and in your README.md Thanks! |
3c8bfb5
to
ead1a96
Compare
Thanks for the guidance @kikisdeliveryservice. Done |
ff20ea2
to
5f2e9fe
Compare
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/retitle KEP-3008: Class-based resources |
8786f80
to
bf746cc
Compare
Addressing review feedback from thocking.
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: marquiz The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Updated details on cluster autoscaler |
- bump versions in kep.yaml - fix typos - added a use case for per-container OOM kill behavior
Small update, added per-container OOM kill behavior as a use case. There is a growing list of use cases for Kubernetes-managed QoS resources with links to ongoing efforts (e.g. swap and OOM kill) |
@marquiz As the person who's been most actively pushing for the OOM config option and just found about this KEP, I strongly agree that this is a great fit! This design is seems awesome and is a really nice generalization of the specific problem I was hoping to have solved! |
container whose memory limit is higher than the requests, but that is treated | ||
by the node as `Guaranteed`. | ||
|
||
Taking this idea further, QoS-class resources could also make it possible to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe kubernetes/kubernetes#78848 is a good tangigle use case to mention.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @kannon92 for the reference, I'll add it to the proposal. This is already speculated in Splitting Pod QoS Class section but I wasn't aware of this open issue.
QoS-class resource. Likely benefits of using the QoS-class resources mechanism | ||
would be to be able to set per-namespace defaults with LimitRanges and allow | ||
permission-control to high-priority classes with ResourceQuotas. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another idea is smarter oom setups (systemd-oomd for example).
systemd-oomd allows one to set settings for oom (psi based pressure and swap usage) on a cgroup slice. I could envision a future where one can create a QoS and that gives knobs to control systemd-oomd for more aggressive oom.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://fedoraproject.org/wiki/Changes/EnableSystemdOomd
I was playing around with this and I explored the ideas of setting different knobs based on the existing kubepods slices.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. So you think a about pre-defined oom kill classes (in contrast to e.g. exposing every possible systemd-oomd knob to the user). At first thought, this sounds like a good fit. I'll add this to the proposal too, as a possible future use case.
that kubelet could evict a running pod that request QoS-class resources that | ||
are no more available on the node. This should be relatively straightforward to | ||
implement as kubelet knows what QoS-class resources are available on the node | ||
and also monitors all running pods. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some areas that eviction manager is starting to show its age.
- If one wants to add more disks to kubelet, we lose the ability for eviction manager to monitor it.
- PSI based eviction
- Swap based eviction
- Moving things out of root partition (logs come in mind)
I've been thinking that eventually we are going to need some kind of pluggable eviction manager based on certain resources. It should be general but I haven't really made much progress on it. wanted to throw this out there as it may be worth considering.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kannon92 thanks for the idea. Do you have any PoC implementation on this? If you have, it would be nice to wire it up to my code and see how it works.
|
||
<!-- References --> | ||
[intel-rdt]: https://www.intel.com/content/www/us/en/architecture-and-technology/resource-director-technology.html | ||
[linux-resctrl]: https://www.kernel.org/doc/html/latest/x86/resctrl.html |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This URL now 404s
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New link:
[linux-resctrl]: https://www.kernel.org/doc/html/latest/x86/resctrl.html | |
[linux-resctrl]: https://docs.kernel.org/arch/x86/resctrl.html |
Hey @marquiz, it's great to see that this is making progress! Do you have any updates on the timeline for when this version is planned to be released? The KEP indicates it's planned for 1.31, but I was wondering if it might end up being pushed to 1.32 or later instead. |
The Kubernetes project currently lacks enough contributors to adequately respond to all PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
The Kubernetes project currently lacks enough contributors to adequately respond to all PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
The Kubernetes project currently lacks enough contributors to adequately respond to all PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
The Kubernetes project currently lacks enough contributors to adequately respond to all PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
New KEP for adding class-based resources to CRI protocol