-
Notifications
You must be signed in to change notification settings - Fork 521
USHIFT-6080: Enable custom feature gates on MicroShift #1851
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@copejon: This pull request references USHIFT-6080 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the spike to target the "4.21.0" version, but no target version was set. In response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
/jira ushift-6177 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple initial comments, I might add more once this enhancement gets traction
|
||
## Summary | ||
|
||
MicroShift currently inherits feature gates from its OpenShift components but lacks a controlled mechanism for users to experiment with additional feature gates or override defaults. This enhancement proposes adding configuration support for Kubernetes and OpenShift feature gates through the MicroShift configuration file. This capability will enable users to experiment with alpha and beta OpenShift and Kubernetes features like CPUManager's `prefer-align-cpus-by-uncorecache` in a supported and deterministic way, addressing edge computing use cases where users want to evaluate advanced resource management capabilities. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From I've seen before the leave, MicroShift does not inherit any settings: whatever is enabled in openshift isn't enabled in microshift. It inherits the FGs, but they're all disabled, I think.
That's why I had to enable these two to fix rebase: openshift/microshift@31598b2
ref: https://github.com/openshift/api/blob/master/features.md
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see now that the rebase script also handles disabling some of the featuregates that are pulled in with openshift manifests. Good catch!
|
||
### Non-Goals | ||
|
||
* Modify OpenShift's feature gate defaults |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This means we'll enable everything that's enabled on openshift, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo, it should say "MicroShift"
|
||
#### Component Integration | ||
|
||
Feature gates will be applied to the following MicroShift components, which are integrated into the MicroShift runtime rather than running as separate processes: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I dealt with FG couple months ago I was under impression that all kube components receive info from the api server about the enabled feature gates, so there was no need to deal with each component config individually, but please keep me honest.
You could add another FG to https://github.com/openshift/microshift/blob/main/pkg/controllers/kube-apiserver.go#L220 and see in the logs if it's propagated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see now. I'd read that each component has a feature-gates
CLI flag in the format of ${FEATURE_GATE}=${BOOL}
. I see now the kube-apiserver is different in that the flag format is ${KUBE_COMPONENT||"KUBE"}:${FEATURE_GATE}=${BOOL}
.
Something else confuses me: the rebase script explicitly sets several featuregates in the kubelet config directly: https://github.com/openshift/microshift/blob/72976404859c8509cc0aa34c1b38255d3a03976e/scripts/auto-rebase/rebase.sh#L689-L692
yq -i '.featureGates.APIPriorityAndFairness = true' "${REPOROOT}/assets/core/kubelet.yaml"
yq -i '.featureGates.PodSecurity = true' "${REPOROOT}/assets/core/kubelet.yaml"
yq -i '.featureGates.DownwardAPIHugePages = true' "${REPOROOT}/assets/core/kubelet.yaml"
But the kubelet doesn't seem to accept them:
kubelet W0929 14:26:08.754914 1210239 feature_gate.go:328] unrecognized feature gate: APIPriorityAndFairness
kubelet W0929 14:26:08.754919 1210239 feature_gate.go:328] unrecognized feature gate: DownwardAPIHugePages
kubelet W0929 14:26:08.754923 1210239 feature_gate.go:328] unrecognized feature gate: PodSecurity
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Less of a concern for now but worth noting:
It looks like the default CSI driver also hardcodes a featuregate: https://github.com/openshift/microshift/blob/a88f42833d122e0bd3c47e2eb43e25f6c5024f8e/assets/optional/topolvm/03-topolvm.yaml#L999-L1005
Upgrades and downgrades are not supported when custom feature gates are configured (TechPreviewNoUpgrade or CustomNoUpgrade). Users must remove all custom feature gate configurations and return to default settings before attempting any version changes. | ||
|
||
This limitation aligns with OpenShift's approach where TechPreviewNoUpgrade and CustomNoUpgrade feature sets explicitly prevent cluster upgrades to avoid compatibility issues with experimental features. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/openshift/api/blob/master/config/v1/types_feature.go#L71-L73
It looks like you cannot go back from TP/DP/Custom FGs once enabled.
Undoing FGs on microshift and upgrading would be a deviation
fixed typo in non-goals added DevPreviewNoUpgrade to featureSets featuregates are irreversible fixed typo in filename
|
||
## Proposal | ||
|
||
This enhancement proposes adding feature gate configuration support to MicroShift by extending `/etc/microshift/config.yaml` with a configuration schema inspired by OpenShift's FeatureGate custom resource specification. In OpenShift, users configure feature gates through the FeatureGate API, and operators independently filter featureGates before applying them to their components. MicroShift takes a different approach aligned with its file-based configuration philosophy: users specify feature gates directly in the configuration file, and MicroShift passes all user-specified featureGates to the kube-apiserver, which then handles propagation to other Kubernetes components. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do the operators filter FeatureGates? Do they use openshift config API or they ask API Server?
What if one of the operators wants to query that info but is unable due to differences - the operator on microshift would miss the functionality, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do the operators filter FeatureGates? Do they use openshift config API or they ask API Server?
Operators watch the FeatureGate API object named cluster
.
For instance, the machine-config-operator
watches FG cluster
, filters kubelet-specific values, and handles r/w to kubelet configs and kubelet service restarts.
What if one of the operators wants to query that info but is unable due to differences - the operator on microshift would miss the functionality, right?
Do you mean differences in how FeatureGates would be published on OpenShift vs MicroShift (config API vs file)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean differences in how FeatureGates would be published on OpenShift vs MicroShift (config API vs file)?
Exactly.
But I think if the need arises, we could just mirror file config -> FG CR
- **API Server Validation**: The kube-apiserver does not validate the feature gates it receives from MicroShift before propagating them. This behavior is the same on OpenShift | ||
- **Component-level Validation**: Each Kubernetes component will validate the feature gates it recognizes | ||
- **Error Reporting**: Components will log errors or warnings for invalid feature gate configurations | ||
- **Startup Failures**: May occur when featureGate settings conflict (i.e. a featureGate is both enabled and disabled) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will implement this in MicroShift during config parsing, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct. I think that because these are new fields in the ushift config, ushift should handle some basic validation.
Edge deployments often have limited remote access for troubleshooting. If users enable experimental feature gates that cause instability, recovering these devices may require physical access or complex recovery procedures. | ||
|
||
**Upgrade Limitations and Irreversible Changes** | ||
Enabling `TechPreviewNoUpgrade`, `DevPreviewNoUpgrade`, or `CustomNoUpgrade` feature sets cannot be undone and prevents both minor version updates and major upgrades. Once enabled, the cluster permanently loses the ability to perform standard updates. These feature sets are explicitly not recommended for production clusters due to their irreversible nature and update limitations, which conflicts with the typical edge deployment requirement for reliable, long-term operation and maintenance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think we should persist history (or something similar) of enabled featuregates? Right now we have no mechanisms that would prevent upgrading after user would revert the FG settings.
We could utilize some file hidden in /var/lib/microshift (security through obscurity) and upon start up log the information, so when we get sosreport, we get some info
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with you. We need this file to both prevent cluster upgrades and to help in handling users' reverting feature customizations. More on this in my later comment
This enhancement does not test whether feature gates actually modify Kubernetes component behavior - that is the responsibility of upstream Kubernetes testing. Testing is limited to verifying that MicroShift correctly passes feature gates to the kube-apiserver. | ||
|
||
**Upgrade Testing:** | ||
Since upgrades are not supported when custom feature gates are configured, no additional upgrade testing is required for this enhancement. Default upgrade behavior without custom feature gates is already covered by existing MicroShift test suites. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless we add something like I described here: https://github.com/openshift/enhancements/pull/1851/files#r2410448992
Upgrades and downgrades proceed normally using standard MicroShift procedures with no additional considerations for feature gate handling. | ||
|
||
**Custom Feature Gate Configurations:** | ||
Upgrades and downgrades are not supported when custom feature gates are configured (TechPreviewNoUpgrade, DevPreviewNoUpgrade, or CustomNoUpgrade). Once custom feature gates are enabled, this configuration cannot be reverted - it is a permanent, one-way operation that permanently disables upgrade capability. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
permanent, one-way operation that permanently disables upgrade capability
How do you ensure this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On OpenShift, reverting a *NoUpgrade
featureSet, which includes the custom list of feature gates, is blocked by API schema validation (here)
MicroShift's config isn't defined by a schema, so I need to come up with a proactive approach to config updates - a simple lock file, like you suggested above, would be enough I think.
On OpenShift, cluster upgradability is determined by the CVO. The CVO checks all Operator API objs for .status.conditions[].type: "Upgradeable"
, and if any are false
, the CVO marks the cluster as a whole as un-upgradable (handled by this func here)
Again, I think this is probably solved by writing a lock file into /var/lib/microshift
.
I'll amend the proposal to reflect this.
When no custom feature gates are configured, standard MicroShift version skew handling applies with no additional considerations. | ||
|
||
### Custom Feature Gate Limitations | ||
When custom feature gates are configured (TechPreviewNoUpgrade, DevPreviewNoUpgrade, or CustomNoUpgrade), upgrades and downgrades between minor versions are not expected to work. Users must remove custom feature gate configurations before attempting minor version changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Users must remove custom feature gate configurations before attempting minor version changes.
I wonder if there's something that should not be disabled after enabling, maybe something that changes data in a way that cannot be returned from, idk
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This statement just needs to be removed entirely I think.
Whether or not an alpha/beta feature is irreversible is something that would have to be kept track of by the controller that implements the feature, if at all.
|
||
### Reverting Custom Feature Gate Configurations To Default | ||
|
||
**Reverting the cluster to it's default feature-gates is unsupported and not recommended.** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't match https://github.com/openshift/enhancements/pull/1851/files#r2410457599
**Recovery Procedures:** | ||
- Configuration changes only require MicroShift service restart, not full system reboot | ||
- Invalid configurations prevent service startup but do not affect system stability | ||
- Greenboot integration ensures automatic rollback if feature gates prevent successful startup |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it? I'm not sure if user edited file in /etc is subject to rollbacks.
I know there's some weird 3 way merge going on, but not sure if the user edited the file...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, you're absolutely right. This statement needs to be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the thorough review @pmtk
I'll fix the discrepancies and amend the validation, fg reversion, and upgrade statements to reflect my responses.
- **API Server Validation**: The kube-apiserver does not validate the feature gates it receives from MicroShift before propagating them. This behavior is the same on OpenShift | ||
- **Component-level Validation**: Each Kubernetes component will validate the feature gates it recognizes | ||
- **Error Reporting**: Components will log errors or warnings for invalid feature gate configurations | ||
- **Startup Failures**: May occur when featureGate settings conflict (i.e. a featureGate is both enabled and disabled) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct. I think that because these are new fields in the ushift config, ushift should handle some basic validation.
|
||
## Proposal | ||
|
||
This enhancement proposes adding feature gate configuration support to MicroShift by extending `/etc/microshift/config.yaml` with a configuration schema inspired by OpenShift's FeatureGate custom resource specification. In OpenShift, users configure feature gates through the FeatureGate API, and operators independently filter featureGates before applying them to their components. MicroShift takes a different approach aligned with its file-based configuration philosophy: users specify feature gates directly in the configuration file, and MicroShift passes all user-specified featureGates to the kube-apiserver, which then handles propagation to other Kubernetes components. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do the operators filter FeatureGates? Do they use openshift config API or they ask API Server?
Operators watch the FeatureGate API object named cluster
.
For instance, the machine-config-operator
watches FG cluster
, filters kubelet-specific values, and handles r/w to kubelet configs and kubelet service restarts.
What if one of the operators wants to query that info but is unable due to differences - the operator on microshift would miss the functionality, right?
Do you mean differences in how FeatureGates would be published on OpenShift vs MicroShift (config API vs file)?
Edge deployments often have limited remote access for troubleshooting. If users enable experimental feature gates that cause instability, recovering these devices may require physical access or complex recovery procedures. | ||
|
||
**Upgrade Limitations and Irreversible Changes** | ||
Enabling `TechPreviewNoUpgrade`, `DevPreviewNoUpgrade`, or `CustomNoUpgrade` feature sets cannot be undone and prevents both minor version updates and major upgrades. Once enabled, the cluster permanently loses the ability to perform standard updates. These feature sets are explicitly not recommended for production clusters due to their irreversible nature and update limitations, which conflicts with the typical edge deployment requirement for reliable, long-term operation and maintenance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with you. We need this file to both prevent cluster upgrades and to help in handling users' reverting feature customizations. More on this in my later comment
Upgrades and downgrades proceed normally using standard MicroShift procedures with no additional considerations for feature gate handling. | ||
|
||
**Custom Feature Gate Configurations:** | ||
Upgrades and downgrades are not supported when custom feature gates are configured (TechPreviewNoUpgrade, DevPreviewNoUpgrade, or CustomNoUpgrade). Once custom feature gates are enabled, this configuration cannot be reverted - it is a permanent, one-way operation that permanently disables upgrade capability. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On OpenShift, reverting a *NoUpgrade
featureSet, which includes the custom list of feature gates, is blocked by API schema validation (here)
MicroShift's config isn't defined by a schema, so I need to come up with a proactive approach to config updates - a simple lock file, like you suggested above, would be enough I think.
On OpenShift, cluster upgradability is determined by the CVO. The CVO checks all Operator API objs for .status.conditions[].type: "Upgradeable"
, and if any are false
, the CVO marks the cluster as a whole as un-upgradable (handled by this func here)
Again, I think this is probably solved by writing a lock file into /var/lib/microshift
.
I'll amend the proposal to reflect this.
When no custom feature gates are configured, standard MicroShift version skew handling applies with no additional considerations. | ||
|
||
### Custom Feature Gate Limitations | ||
When custom feature gates are configured (TechPreviewNoUpgrade, DevPreviewNoUpgrade, or CustomNoUpgrade), upgrades and downgrades between minor versions are not expected to work. Users must remove custom feature gate configurations before attempting minor version changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This statement just needs to be removed entirely I think.
Whether or not an alpha/beta feature is irreversible is something that would have to be kept track of by the controller that implements the feature, if at all.
**Recovery Procedures:** | ||
- Configuration changes only require MicroShift service restart, not full system reboot | ||
- Invalid configurations prevent service startup but do not affect system stability | ||
- Greenboot integration ensures automatic rollback if feature gates prevent successful startup |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, you're absolutely right. This statement needs to be removed.
…ture gates Provide implementation details on how cluster upgrades and changes to configured feature gate will be prevented.
|
||
MicroShift does not deploy these operators and must a different approach which is aligned with its file-based configuration philosophy: users specify feature gates directly in the configuration file, and MicroShift passes all user-specified featureGates to the kube-apiserver, which then handles propagation to other Kubernetes components. Service restarts are executed by the cluster admin by restarting the MicroShift process. | ||
|
||
> **Important!** The use of custom feature gates on OpenShift is irreversible and renders a cluster unable to be upgraded. This feature should only be used for testing alpha/beta features and should never be used in productions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we expect that a dev to test using feature gates and then build the final deployment system separately? Should we advise that as a best practice, rather than dev trying to use this feature and then remove it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes to both questions. Enabling custom feature gates is not a supported configuration. It is strictly for users who wish to test alpha/beta kube features.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we expect that a dev to test using feature gates and then build the final deployment system separately?
I think it's general expectation, at least with bootc. Or at least for me.
You play with dev machine, you figure out how to do stuff, but for the production you want to automate as much as possible - like automatic updates to new bootc image, onboarding (maybe FIDO), etc. I wouldn't expect a fleet of 1000 devices to be set up one by one.
4. **API Server Propagation**: All configured featureGates will be passed to the kube-apiserver, which handles propagation to other Kubernetes components (kubelet, kube-controller-manager, kube-scheduler). Service restarts are the responsibility of the cluster admin. | ||
5. **Prevent Feature Gate Config Changes**: OpenShift prevents users from reverting custom feature gates via spec validation rules. This is an not option for the MicroShift config. Instead, MicroShift will check for custom feature gates at startup. If customizations exist, MicroShift will write a sentinel file to `/var/lib/microshift/`.This file will contain the custom feature gates. When MicroShift next restarts, it will check for this file and overwrite the in-memory config's feature gate settings with those stored in the sentinel file. | ||
|
||
**Note**: MicroShift will not overwrite `/etc/microshift/config.yaml`. Only the in-memory config will be affected. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So can we use config snippets with this feature, or no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that would be ideal
|
||
**Note**: MicroShift will not overwrite `/etc/microshift/config.yaml`. Only the in-memory config will be affected. | ||
|
||
6. **Preventing Clusters Upgrades**: Upgrades on OpenShift are prevented at the cluster level by the cluster-version-operator, in conjunction with other OpenShift operators. However, MicroShift lacks these operators. Instead, MicroShift's install/upgrade logic will re-use the sentinel file described in #5. If the file exists, the cluster is un-upgradeable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, so perhaps we should add this to update troubleshooting? What error might we see if we fail an update because of the sentinel file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bootc users should refer to the journal to find the rpm install error.
Rpm users would see the error in stdout during rpm installation
@pacevedom @pmtk wdyt about that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bootc users should refer to the journal to find the rpm install error.
Not sure if this is possible - AFAIK bootc is not reinstalling system or upgrading it on a package level, I think it's more of a snapshot of new root fs which you switch to on boot.
I think this would have to be failure at the beginning of the microshift startup. Something similar to "version based upgrade blocking" we haven't even used once.
This topic is mentioned in couple places, but I won't create extra comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, good point. Handling it this way would also create a congruent behavior to what you propose when users change feature gates.
4. Upgrade fails to proceed, preserving the current MicroShift version | ||
5. Administrator must either: | ||
- Continue using the current version with custom feature gates | ||
- Wipe MicroShift's state (`$ sudo microshift-cleanup-data --all`) and restart MicroShift service (`$ sudo systemctl restart microshift`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We currently advise users to open a support case before running a cleanup-data --all
. Is there an easier way to accomplish removing the feature manually without having to do this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that's the goal: you don't want users to revert the FG because upgrades are not supported. It could be that FG makes some changes to the cluster that might render it unusable upon disabling FG.
We want to avoid situation like: enables FG, runs, disables FG, upgrades, re-enables FG.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To add: They are not guaranteed to be backwards compatible.
FeatureGates represent mostly alpha and beta features, meaning they are in active development and could differ significantly between minor or patch versions. Users may open support tickets to address this but more than likely the engineering team can't do much about it.
|
||
#### Standalone Clusters | ||
|
||
This enhancement is primarily designed for standalone MicroShift deployments where administrators need direct control over feature gate configuration through the local configuration file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In production environments and development environments both? Should we give some guidance about the differences in using feature gates per use case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll reword this to be more explicit. FeatureGates should never be used in production environments. They exist to allow developers and users to experiment with k8s alpha/beta features before they become part of k8s by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strike that. I had a crossed wire when writing this - stand alone cluster is a specific kind of OpenShift deployment and not MicroShift. Removed
|
||
## Test Plan | ||
|
||
The testing strategy focuses on verifying the propagation functionality - that custom feature gate configurations are correctly parsed from the MicroShift configuration file and passed to the kube-apiserver, which then handles propagation to other Kubernetes components. Testing validates the parsing and delivery mechanism rather than feature gate functionality itself. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will we still provide users with an example config in docs that is tested?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, ty. Will add
|
||
## Graduation Criteria | ||
|
||
The feature is planned to be released as GA directly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if I understand, we'll support using this feature, but not necessarily whatever is enabled with it? How do we define the support scope so it's clear to everyone?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if I understand, we'll support using this feature, but not necessarily whatever is enabled with it?
Correct. The microshift feature is that users will be able to configure feature-gates. Not the feature-gates themselves.
How do we define the support scope so it's clear to everyone?
I need to add more information on this. This OpenShift article provides the best detail I've found so far on support. However it only distinguishes between DevPreview and TechPreview (sets of a feature gates), and not individual feature gates.
When custom feature gates are configured (TechPreviewNoUpgrade, DevPreviewNoUpgrade, or CustomNoUpgrade), upgrades and downgrades between minor versions are not expected to work. Users must remove custom feature gate configurations before attempting minor version changes. | ||
|
||
### Feature Gate Consistency Across Components | ||
Feature gate skew can occur between embedded components. On OpenShift, this is a non-issue. On MicroShift, it is a known issue that one component's default may be to disable a feature, while another comonpent enables it. This problem is tracked by [USHIFT-2813](https://issues.redhat.com/browse/USHIFT-2813). Solving this issue is outside the scope of this proposal. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds like we need to add a Known Issue to the docs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can wait on this for now. We have a ticket to fix this issue. It could end up being a doc fix, but that's not known right now.
|
||
## Operational Aspects of API Extensions | ||
|
||
Any changes to the MicroShift configuration schema must be backwards compatible by at least y-2 minor versions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this still true if we can't downgrade, or is it a non-issue with this feature?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be a non-issue for downgrades also. It just mean that the target version of microshift will still be able to read the older version of the config.
### Reverting Custom Feature Gate Configurations To Default | ||
|
||
**Recovery Procedures:** | ||
- To restore MicroShift to a stable and supported state, users must run `$ sudo microshift-cleanup-data --all`, set `.featureGates: {}`, and restart MicroShift |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After we do this sequence, how do we test to make sure it worked as intended?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be enough to checkout component logs or config files to verify there are no custom feature gates. microshift-cleanup-data --all
and deleting .featureGates
from the config is effectively wiping the slate clean.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A full cleanup wipes user data too from the cluster, as it removes all directories (including etcd).
Would it be enough to restart MicroShift instead? That should bring the control plane back a stable state, regardless of the workloads (as we do not have operators). The kubelet writes its configuration every time MicroShift is restarted and all components get the flags in similar fashion (code or config file, but before starting). If user's application breaks because it relies on specific FeatureGates then that is on them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because featureGates are either alpha or beta stages, its not a guarantee that disabling a feature ensures the changes that feature made to the cluster are also cleaned up. This could lead to errors that aren't trivial to diagnose.
3. **Custom Feature Gates**: Support for individual feature gate enablement/disablement via `customNoUpgrade` configuration | ||
4. **API Server Propagation**: All configured featureGates will be passed to the kube-apiserver, which handles propagation to other Kubernetes components (kubelet, kube-controller-manager, kube-scheduler) | ||
4. **API Server Propagation**: All configured featureGates will be passed to the kube-apiserver, which handles propagation to other Kubernetes components (kubelet, kube-controller-manager, kube-scheduler). Service restarts are the responsibility of the cluster admin. | ||
5. **Prevent Feature Gate Config Changes**: OpenShift prevents users from reverting custom feature gates via spec validation rules. This is an not option for the MicroShift config. Instead, MicroShift will check for custom feature gates at startup. If customizations exist, MicroShift will write a sentinel file to `/var/lib/microshift/`.This file will contain the custom feature gates. When MicroShift next restarts, it will check for this file and overwrite the in-memory config's feature gate settings with those stored in the sentinel file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or maybe it should fail to start if this was changed? I know it's "NoUpgrade", not "NoStart", but this would be explicit. In memory config overwrite seems a bit hidden
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mostly agree with you. The one sticking point for me is that OCP does almost the same thing as proposed here by silently reverting user changes to the FeatureGates API object. But maybe we can consider MicroShift enough of a different beast to justify not starting. Or perhaps some very loud logging would be sufficient?
Invalid feature gate configurations in the MicroShift configuration file could prevent MicroShift components from starting. | ||
|
||
*Mitigation:* Kubernetes components inherently ignore unrecognized feature gate names, so typos or mispellings may not cause failures. Only invalid values for recognized gates can cause issues. Components provide clear error messages for such cases, and documentation will guide troubleshooting. Recommended that users run `microshift-cleanup-script`, delete the custom feature gates from `/etc/microshift/config.yaml` and restart the MicroShift service. | ||
*Mitigation:* Kubernetes components inherently ignore unrecognized feature gate names, so typos or mispellings may not cause failures. Components provide clear warning messages for such cases, and documentation will guide troubleshooting. Recommended that users run `microshift-cleanup-script`, correct the invalid config values in `/etc/microshift/config.yaml`, then restart the service. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if using cleanup is strictly needed here. You can play with FGs on dev machine, so correcting an invalid value should be okay?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's hard to say but I think we don't need to err this far on the side of caution. Will remove the cleanup.
6. The kube-apiserver propagates the feature gates to other Kubernetes components (kubelet, kube-controller-manager, kube-scheduler) | ||
7. Each component processes the featureGates and enables/disables the features it supports according to the configured state | ||
5. MicroShift detects the custom FeatureGate configuration. | ||
6. MicroShift writes a sentinel file to `/var/lib/microshift/`, containing the feature gate config. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also log the setting? So if we receive sosreport we'll see in the history that FG was enabled? You can, after all, remove the sentinel file and bypass the upgrade prevention mechanism.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed! Will add
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great feedback! Comments in-line
3. **Custom Feature Gates**: Support for individual feature gate enablement/disablement via `customNoUpgrade` configuration | ||
4. **API Server Propagation**: All configured featureGates will be passed to the kube-apiserver, which handles propagation to other Kubernetes components (kubelet, kube-controller-manager, kube-scheduler) | ||
4. **API Server Propagation**: All configured featureGates will be passed to the kube-apiserver, which handles propagation to other Kubernetes components (kubelet, kube-controller-manager, kube-scheduler). Service restarts are the responsibility of the cluster admin. | ||
5. **Prevent Feature Gate Config Changes**: OpenShift prevents users from reverting custom feature gates via spec validation rules. This is an not option for the MicroShift config. Instead, MicroShift will check for custom feature gates at startup. If customizations exist, MicroShift will write a sentinel file to `/var/lib/microshift/`.This file will contain the custom feature gates. When MicroShift next restarts, it will check for this file and overwrite the in-memory config's feature gate settings with those stored in the sentinel file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mostly agree with you. The one sticking point for me is that OCP does almost the same thing as proposed here by silently reverting user changes to the FeatureGates API object. But maybe we can consider MicroShift enough of a different beast to justify not starting. Or perhaps some very loud logging would be sufficient?
|
||
**Note**: MicroShift will not overwrite `/etc/microshift/config.yaml`. Only the in-memory config will be affected. | ||
|
||
6. **Preventing Clusters Upgrades**: Upgrades on OpenShift are prevented at the cluster level by the cluster-version-operator, in conjunction with other OpenShift operators. However, MicroShift lacks these operators. Instead, MicroShift's install/upgrade logic will re-use the sentinel file described in #5. If the file exists, the cluster is un-upgradeable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, good point. Handling it this way would also create a congruent behavior to what you propose when users change feature gates.
6. The kube-apiserver propagates the feature gates to other Kubernetes components (kubelet, kube-controller-manager, kube-scheduler) | ||
7. Each component processes the featureGates and enables/disables the features it supports according to the configured state | ||
5. MicroShift detects the custom FeatureGate configuration. | ||
6. MicroShift writes a sentinel file to `/var/lib/microshift/`, containing the feature gate config. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed! Will add
6. MicroShift logs a warning that custom feature gates cannot be reverted once applied | ||
7. The cluster continues to run with the original custom feature gates despite the configuration change attempt | ||
|
||
##### Attempt to Upgrade Cluster with Custom Feature Gates |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, ty! Will fix
4. Upgrade fails to proceed, preserving the current MicroShift version | ||
5. Administrator must either: | ||
- Continue using the current version with custom feature gates | ||
- Wipe MicroShift's state (`$ sudo microshift-cleanup-data --all`) and restart MicroShift service (`$ sudo systemctl restart microshift`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To add: They are not guaranteed to be backwards compatible.
FeatureGates represent mostly alpha and beta features, meaning they are in active development and could differ significantly between minor or patch versions. Users may open support tickets to address this but more than likely the engineering team can't do much about it.
|
||
#### Standalone Clusters | ||
|
||
This enhancement is primarily designed for standalone MicroShift deployments where administrators need direct control over feature gate configuration through the local configuration file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll reword this to be more explicit. FeatureGates should never be used in production environments. They exist to allow developers and users to experiment with k8s alpha/beta features before they become part of k8s by default.
- kube-controller-manager | ||
- kube-scheduler | ||
|
||
The resource consumption impact will be minimal as this enhancement only adds configuration parsing and pass-through functionality. The actual resource impact will depend on which feature gates are enabled by users and their specific behaviors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, and also I don't think its data we would want to maintain in the long term. Because of the experimental nature of FGs, it's more up to the user to analyze the benefits for their particular environment. Proposed feature is itself not at all resource intensive, while the featureGates a user may set may alter resource overhead.
Invalid feature gate configurations in the MicroShift configuration file could prevent MicroShift components from starting. | ||
|
||
*Mitigation:* Kubernetes components inherently ignore unrecognized feature gate names, so typos or mispellings may not cause failures. Only invalid values for recognized gates can cause issues. Components provide clear error messages for such cases, and documentation will guide troubleshooting. Recommended that users run `microshift-cleanup-script`, delete the custom feature gates from `/etc/microshift/config.yaml` and restart the MicroShift service. | ||
*Mitigation:* Kubernetes components inherently ignore unrecognized feature gate names, so typos or mispellings may not cause failures. Components provide clear warning messages for such cases, and documentation will guide troubleshooting. Recommended that users run `microshift-cleanup-script`, correct the invalid config values in `/etc/microshift/config.yaml`, then restart the service. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's hard to say but I think we don't need to err this far on the side of caution. Will remove the cleanup.
|
||
## Test Plan | ||
|
||
The testing strategy focuses on verifying the propagation functionality - that custom feature gate configurations are correctly parsed from the MicroShift configuration file and passed to the kube-apiserver, which then handles propagation to other Kubernetes components. Testing validates the parsing and delivery mechanism rather than feature gate functionality itself. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, ty. Will add
|
||
## Graduation Criteria | ||
|
||
The feature is planned to be released as GA directly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if I understand, we'll support using this feature, but not necessarily whatever is enabled with it?
Correct. The microshift feature is that users will be able to configure feature-gates. Not the feature-gates themselves.
How do we define the support scope so it's clear to everyone?
I need to add more information on this. This OpenShift article provides the best detail I've found so far on support. However it only distinguishes between DevPreview and TechPreview (sets of a feature gates), and not individual feature gates.
- Upgrade is blocked with an error message indicating custom feature gates prevent upgrades | ||
4. Upgrade fails to proceed, preserving the current MicroShift version | ||
4. MicroShift detects `/var/lib/microshift/no-upgrade` with differring feature gate config. | ||
5. MicroShift logs a fatal error that custom feature gates cannot be reverted or changed once applied |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we allow for changing to another not upgradable FG set? Like freely change between TechPreviewNoUpgrade
, DevPreviewNoUpgrade
, and customNoUpgrade
but prevent from resetting them to upgradable setting? Or this is also prohibited by OpenShift?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could, but we'd be diverging from OpenShift's featureset rules, which I don't think we'd want.
No more Q from me |
4. If `/var/lib/microshift/no-upgrade` exists (indicating custom feature gates are configured), then the upgrade is blocked with a fatal error indicating custom feature gates break upgrades | ||
5. Administrator must either: | ||
- Revert the upgrade back to the prior version, OR | ||
- Wipe MicroShift's state (`$ sudo microshift-cleanup-data --all`) and restart MicroShift service (`$ sudo systemctl restart microshift`). This returns the node to a supported state. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: pacevedom The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@copejon: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
No description provided.