-
Notifications
You must be signed in to change notification settings - Fork 457
[release-4.20] OCPNODE-3722: Enforce OCP 4.20 and earlier cluster to have AutoSizingReserved disabled by default #5387
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: release-4.20
Are you sure you want to change the base?
Conversation
|
Skipping CI for Draft Pull Request. |
2507d91 to
e8e4d53
Compare
|
/test all |
e8e4d53 to
c9f9e79
Compare
|
@ngopalak-redhat: This pull request references OCPNODE-3718 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the epic to target the "4.20.z" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/test unit |
|
@ngopalak-redhat: This pull request references OCPNODE-3718 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the epic to target the "4.20.z" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@ngopalak-redhat: This pull request references OCPNODE-3722 which is a valid jira issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@ngopalak-redhat: This pull request references OCPNODE-3722 which is a valid jira issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/test bootstrap-unit |
a958166 to
b12addd
Compare
|
/test all |
|
@ngopalak-redhat: This pull request references OCPNODE-3722 which is a valid jira issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@haircommander @sairameshv Can you please review? |
|
/hold Until rebasing and testing the rebase |
6b55488 to
f823943
Compare
|
/unhold All tests are passing with rebase |
|
/retest LGTM, @sairameshv @QiWang19 PTAL |
sairameshv
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few nits to be addressed.
The changes LGTM
|
|
||
| ignConfig := ctrlcommon.NewIgnConfig() | ||
|
|
||
| autoSizingMC, err := ctrlcommon.MachineConfigFromIgnConfig(pool.Name, autoSizingDisabledMCName, ignConfig) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: ignConfig gets replaced in the further steps. So, why can't we directly create the rawAutoSizingIgn and pass the same while creating a machine config?
| autoSizingMC, err := ctrlcommon.MachineConfigFromIgnConfig(pool.Name, autoSizingDisabledMCName, ignConfig) | |
| autoSizingMC, err := ctrlcommon.MachineConfigFromIgnConfig(pool.Name, autoSizingDisabledMCName, rawAutoSizingIgn) |
| return autoSizingMC, nil | ||
| } | ||
|
|
||
| // createAutoSizingIgnConfig creates the Ignition config with environment variables |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this sound more appropriate?
| // createAutoSizingIgnConfig creates the Ignition config with environment variables | |
| // createDefaultAutoSizingIgnConfig creates the Ignition config with the default auto sizing environment variable contents |
|
@sairameshv Can you review again? Only change is the removal of the unit test |
…rved disable by default
f7552c3 to
8f0b0a8
Compare
sairameshv
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM apart from minor nits
/lgtm
|
/retest-required |
|
@ngopalak-redhat: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
LGTM /approve Looks like it is unhappy with the jira card though |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ngopalak-redhat, sairameshv, umohnani8 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Fixes: #OCPNODE-3722
- What I did
This patch introduces the
50-master-auto-sizing-disabledMachineConfig to OpenShift 4.20 clusters, setting theNODE_SIZING_ENABLEDflag to false by default on master and worker nodes.This change is required as we are making auto sizing enabled by default for cluster created using 4.21 and above.
Summary of changes
Enforce Default Autosizing: Ensures that clusters created in 4.20 will retain the pre-4.21 behavior of having auto node sizing disabled by default.
Upgrade Pre-requisite: This patch is a mandatory requirement for upgrading 4.20 clusters to 4.21. Changes to Cincinnati (OCPNODE-3722: Set minimum version of 4.20 required to upgrade to 4.21 cincinnati-graph-data#8277) will enforce that this patch must be present before the upgrade path to 4.21 is started.
User Override (Priority): The MachineConfig uses the prefix 01- to ensure it sets the initial default. If a user has already created a KubeletConfig to explicitly enable autoSizing (as per the KubeletConfig documentation), that explicit user configuration will take precedence (override this default) and will be retained when upgrading to 4.21.
Reference: This change addresses the shift in default behavior introduced in OpenShift 4.21, where NODE_SIZING_ENABLED is set to true for all new clusters: #5390
Additional Notes for Developers
The approach taken in this PR is patterned after the change implemented in #4715, which was used to modify the default container runtime.
Rejected Alternatives
We explored several alternative solutions, but they were not feasible:
In-Place Upgrade Handling: We found that direct handling during the 4.21 upgrade was unreliable. After multiple upgrade cycles, there was no consistent mechanism to identify clusters originally provisioned before 4.21.
Changing the Default File: Switching the default configuration file (e.g., away from /etc/node-sizing-enabled) was overly complex, requiring us to manually manage legacy configuration paths for existing clusters.
Installer-Created KubeletConfig: Since OpenShift clusters do not contain a default KubeletConfig resource, one option was to have the installer create it. This was rejected because Hypershift deployments may bypass the standard OCP installer.
Adding a Default KubeletConfig Resource: This approach was dismissed because OpenShift allows only a single KubeletConfig per cluster. Introducing a default resource risks a user's explicit KubeletConfig unintentionally overriding the system default, leading to confusion.
- How to verify it
Verified the patch on a 4.20 cluster: Created a cluster using ClusterBot, applied the patch via
oc adm upgrade, confirmed the new MachineConfig was created, and ensured auto node sizing was disabled.Direct Patch Verification: Created a cluster using ClusterBot with the patch applied and confirmed auto node sizing was disabled.
User Override Test: Created a KubeletConfig to explicitly enable auto sizing and verified that the setting was correctly enabled (overriding the default).
Upgrade Path Validation: Successfully upgraded the patched cluster to 4.21 (using the above referenced 4.21 PR changes). Confirmed that auto node sizing remained disabled for upgraded clusters that had not been explicitly configured otherwise.
- Description for the changelog
Introduces the auto sizing MachineConfig, ensuring the feature remains disabled by default during upgrade