-
Couldn't load subscription status.
- Fork 1.4k
Description
⚠️ Cluster API maintainers can ask to turn an issue-proposal into a CAEP when necessary, this is to be expected for large changes that impact multiple components, breaking changes, or new large features.
Goals
- Provide a definitive way to determine if a machine bootstrapped successfully or not
Non-Goals/Future Work
- N/A
User Story
As a user, I would like to know if a Machine failed to bootstrap, so that I don't have a node joining the cluster that may not be fully functional.
Detailed Description
There is currently nothing in the contract for infrastructure providers for indicating if machine bootstrapping succeeded or failed. We have seen multiple instances where (with the kubeadm bootstrap provider) cloud-init runs, kubeadm join executes, a new Node joins the workload cluster, but kubeadm actually had an error (had a non-zero exit code). In some circumstances, the Node joins the cluster but may be missing default taints/labels (such as the master node role).
I'd like us to find a way for an infrastructure provider to report if bootstrapping succeeded or failed. The exact manner by which each infrastructure provider checks for success or failure will probably need to vary, but we should be able to define a common status field in each "infrastructure machine" that indicates success or failure.
I'm not sure what we should do exactly around remediating Machines in this state. We could potentially integrate with MachineHealthCheck, and we'll need to figure something out for KubeadmControlPlane Machines too.
Contract changes [optional]
- All infrastructure providers must populate the new field/condition described below based on bootstrap success or failure
- Machine controller copies field/condition to its own status
Data model changes [optional]
- Add a way (status field or condition) for infrastructure machine CRDs to indicate bootstrap succeeded/failed
/kind proposal