- 
                Notifications
    You must be signed in to change notification settings 
- Fork 1.4k
Description
What steps did you take and what happened:
This is an issue reported in the slack: https://kubernetes.slack.com/archives/C8TSNPY4T/p1667740494784379
Did anyone hit the error that ClusterResourceSet controller applies objects in ClusterResourceSet too early before the Service kubernetes is created?   I hit this error once this week, while using ClusterResourceSet to deploy kapp-controller which contains a Service kapp-controller/packaging-api .  This service is assigned with the IP “10.96.0.1”, and then creating the Service kubernetes failed due to service IP conflict.
# k logs -n kube-system       kube-apiserver-mycluster-controlplane-pl4vn
E1106 12:09:27.196308       1 controller.go:240] unable to sync kubernetes service: Service "kubernetes" is invalid: spec.clusterIPs: Invalid value: []string{"10.96.0.1"}: failed to allocate IP 10.96.0.1: provided IP is already allocated
E1106 12:09:37.197558       1 controller.go:240] unable to sync kubernetes service: Service "kubernetes" is invalid: spec.clusterIPs: Invalid value: []string{"10.96.0.1"}: failed to allocate IP 10.96.0.1: provided IP is already allocated
# k get svc -A
NAMESPACE         NAME            TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)                  AGE
kapp-controller   packaging-api   ClusterIP   10.96.0.1    <none>        443/TCP                  2d2h
kube-system       kube-dns        ClusterIP   10.96.0.10   <none>        53/UDP,53/TCP,9153/TCP   2d2h
# k get node
NAME                           STATUS     ROLES           AGE    VERSION
mycluster-controlplane-pl4vn   NotReady   control-plane   2d3h   v1.24.4
mycluster-workergroup1-ccfcz   NotReady   <none>          2d3h   v1.24.4
mycluster-workergroup1-lmx7b   NotReady   <none>          2d3h   v1.24.4
The service object creation timestamp:
# k get svc -n kapp-controller   packaging-api -oyaml |grep creationTimestamp
  creationTimestamp: "2022-11-04T09:37:14Z"
Seems the CRS controller just gets the remote client for the workload cluster, but does not check if the Service kubernetes in the workload cluster has been created:
https://github.com/kubernetes-sigs/cluster-api/blob/v1.2.7/exp/addons/internal/controllers/clusterresourceset_controller.go#L239-L247
What did you expect to happen:
kapp-controller CRS should be applied successfully
Anything else you would like to add:
We tried to workaround this issue by adding the wait logic before applying CRS objects like this:
err = wlcClient.Get(ctx, apitypes.NamespacedName{
	Namespace: metav1.NamespaceDefault,
	Name:      "kubernetes",
}, &corev1.Service{})
if err != nil && !apierrors.IsNotFound(err) {
	return reconcile.Result{}, err
}
if apierrors.IsNotFound(err) {
	ctx.Logger.Info("Wait for the Service kubernetes to be created")
	return reconcile.Result{RequeueAfter: NormalRequeueTimeout}, nil
}
Environment:
- Cluster-api version: 1.2.7
- minikube/kind version:
- Kubernetes version: (use kubectl version):
- OS (e.g. from /etc/os-release):
/kind bug
[One or more /area label. See https://github.com/kubernetes-sigs/cluster-api/labels?q=area for the list of labels]