-
Notifications
You must be signed in to change notification settings - Fork 1
feature: implement simple checker operator using kubebuilder #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feature: implement simple checker operator using kubebuilder #2
Conversation
velp
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your time. The result is really impressive. I see that you tried to do the work as close as possible to real conditions and requirements. I would like to discuss several solutions, one of them I left as a comment to the review.
| }, | ||
| Command: []string{"sh", "-c"}, | ||
| Args: []string{ | ||
| "curl -o /dev/null -s -w \"%{http_code}\" ${TARGET_URL}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I expected to see a permanently living POD that is created by the controller and whose existence is monitored by the controller. But your solution is also OK - a more kubernetes way.
I've tested a bit your solution with:
kubectl apply -f ./example/operator.yaml
kubectl create ns checker
kubectl apply -f ./example/example.yaml
and I got errors:
$ kubectl get pod -n checker
NAME READY STATUS RESTARTS AGE
github-28804985-9mrff 0/1 ContainerCreating 0 1s
github-28804985-x429t 0/1 Error 0 21s
$ kubectl logs -n checker github-28804985-x429t
000
I can reproduce it manually, and it looks like some problem with curl image (DNS resolving is working in the cluster but curl does not):
$ kubectl run -i --tty --rm debug --image=curlimages/curl:8.10.1 --restart=Never -- sh
If you don't see a command prompt, try pressing enter.
~ $ curl -o /dev/null -s -w "%{http_code}" https://github.com/
000
~ $ curl https://github.com/
curl: (6) Could not resolve host: github.com
~ $ nslookup github.com
Server: 169.254.25.10
Address: 169.254.25.10:53
Non-authoritative answer:
Non-authoritative answer:
Name: github.com
Address: 140.82.121.4
the problem of the image out of scope in this task but I see issue with error handling in operator, because at the same time in operator I see:
github-checker-operator-controller-manager-5c6454d5b6-smtqn manager 2024-10-07T11:12:08Z INFO Status updated successfully {"Checker.Name": "github"}
github-checker-operator-controller-manager-5c6454d5b6-smtqn manager 2024-10-07T11:12:23Z INFO Trigger reconcile: {"req.NamespacedName": {"name":"github","namespace":"checker"}}
github-checker-operator-controller-manager-5c6454d5b6-smtqn manager 2024-10-07T11:12:23Z INFO ConfigMap already exists, skipping creation {"ConfigMap.Name": "github"}
github-checker-operator-controller-manager-5c6454d5b6-smtqn manager 2024-10-07T11:12:23Z INFO ConfigMap updated successfully {"ConfigMap.Name": "github"}
github-checker-operator-controller-manager-5c6454d5b6-smtqn manager 2024-10-07T11:12:23Z INFO CronJob already exists, skipping creation {"CronJob.Name": "github"}
github-checker-operator-controller-manager-5c6454d5b6-smtqn manager 2024-10-07T11:12:23Z INFO CronJob updated successfully {"CronJob.Name": "github"}
github-checker-operator-controller-manager-5c6454d5b6-smtqn manager 2024-10-07T11:12:23Z INFO Looking up logs from found latest pod {"latestPod.Name": "github-28804992-lc4k9"}
github-checker-operator-controller-manager-5c6454d5b6-smtqn manager 2024-10-07T11:12:23Z ERROR Could not get logs from pod {"error": "container \"curl\" in pod \"github-28804992-lc4k9\" is waiting to start: ContainerCreating"}
github-checker-operator-controller-manager-5c6454d5b6-smtqn manager github.com/cloudification-io/github-checker-operator/internal/controller.(*CheckerReconciler).UpdateStatus
github-checker-operator-controller-manager-5c6454d5b6-smtqn manager /workspace/internal/controller/utils.go:208
github-checker-operator-controller-manager-5c6454d5b6-smtqn manager github.com/cloudification-io/github-checker-operator/internal/controller.(*CheckerReconciler).Reconcile
github-checker-operator-controller-manager-5c6454d5b6-smtqn manager /workspace/internal/controller/checker_controller.go:73
github-checker-operator-controller-manager-5c6454d5b6-smtqn manager sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Reconcile
github-checker-operator-controller-manager-5c6454d5b6-smtqn manager /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:116
github-checker-operator-controller-manager-5c6454d5b6-smtqn manager sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler
github-checker-operator-controller-manager-5c6454d5b6-smtqn manager /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:303
github-checker-operator-controller-manager-5c6454d5b6-smtqn manager sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem
github-checker-operator-controller-manager-5c6454d5b6-smtqn manager /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:263
github-checker-operator-controller-manager-5c6454d5b6-smtqn manager sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2.2
github-checker-operator-controller-manager-5c6454d5b6-smtqn manager /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:224
and I cannot understand by these logs what is going on.
What I expected to see is correct information that domain was not resolved.
Any suggestions how we can improve that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok @velp I see 2 things going on, lets start with easier one:
-
ERROR Could not get logs from pod {"error": "container \"curl\" in pod \"github-28804992-lc4k9\" is waiting to start: ContainerCreating"}looks like I did not properly handle waiting for the pod to reach Succeded phase before fetching logs. I can easily patch that :D -
I suspect the issue with curl might be related to DNS or network configuration. I have a general question about your environment - are you running the operator behind some proxy or FW? I need to ask you to provide me some more info about that. Please, run those commands in the pod you have spawned before and paste the output here so I can inspect whats going on:
nc -vz github.com 443
nslookup github.com
cat /etc/resolv.conf
For me, it looks like there’s a DNS issue or the cluster has been deployed with something like NodeLocal DNSCache enabled, which could be causing the problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- It cannot handle the situation when the POD in crashloop or when there are 2 containers (one is crashing the second one is creating):
$ kgp -n checker
NAME READY STATUS RESTARTS AGE
github-28806357-cxct2 0/1 Error 0 35s
github-28806357-gvnlv 0/1 ContainerCreating 0 9s
- results of commands bellow:
~ $ nc -vz github.com 443
nc: bad address 'github.com'
~ $ nslookup github.com
Server: 169.254.25.10
Address: 169.254.25.10:53
Non-authoritative answer:
Non-authoritative answer:
Name: github.com
Address: 140.82.121.4
~ $ cat /etc/resolv.conf
search default.svc.qa.cloudification.io svc.qa.cloudification.io qa.cloudification.io
nameserver 169.254.25.10
options ndots:5
but as I said, debugging the DNS problem is out of scope of this task, my question was only about error handling. And even if operator can fetch the logs it will see there 000:
kl -n checker github-28806357-cxct2
log is DEPRECATED and will be removed in a future version. Use logs instead.
000
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok @velp so:
-
issue with fetching logs from latest pod before it gets to Succeded phase has been fixed in 1c7051c
-
whenever the operator fetches
000from the pod it sets the Checker CRD status to Unknown which in my mind is better way to inspect status of the Checker:
λ k get pods
NAME READY STATUS RESTARTS AGE
github-28806509-28zmv 0/1 ErrImagePull 0 58s
λ k get checker
NAME TARGET STATUS
github Unknown
λ k get pods
NAME READY STATUS RESTARTS AGE
github-28806528-qgrrq 0/1 Completed 0 52s
λ k get checker
NAME TARGET STATUS
github 200
velp
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, thank you, for your time. Dmitry will contact you soon
This PR introduces a Kubernetes operator that manages a simple Kubebuilder operator, which monitors the availability of any given website and logs HTTP response codes. Static YAML manifests for the operator are included and Docker images for both amd64 and arm64 architectures are pushed to the Dockerhub.