Skip to content

Conversation

@patrostkowski
Copy link

@patrostkowski patrostkowski commented Oct 6, 2024

This PR introduces a Kubernetes operator that manages a simple Kubebuilder operator, which monitors the availability of any given website and logs HTTP response codes. Static YAML manifests for the operator are included and Docker images for both amd64 and arm64 architectures are pushed to the Dockerhub.

Copy link
Collaborator

@velp velp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your time. The result is really impressive. I see that you tried to do the work as close as possible to real conditions and requirements. I would like to discuss several solutions, one of them I left as a comment to the review.

},
Command: []string{"sh", "-c"},
Args: []string{
"curl -o /dev/null -s -w \"%{http_code}\" ${TARGET_URL}",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expected to see a permanently living POD that is created by the controller and whose existence is monitored by the controller. But your solution is also OK - a more kubernetes way.

I've tested a bit your solution with:

kubectl apply -f ./example/operator.yaml
kubectl create ns checker
kubectl apply -f ./example/example.yaml

and I got errors:

$ kubectl get pod -n checker
NAME                    READY   STATUS              RESTARTS   AGE
github-28804985-9mrff   0/1     ContainerCreating   0          1s
github-28804985-x429t   0/1     Error               0          21s

$ kubectl logs -n checker github-28804985-x429t
000

I can reproduce it manually, and it looks like some problem with curl image (DNS resolving is working in the cluster but curl does not):

$ kubectl run -i --tty --rm debug --image=curlimages/curl:8.10.1 --restart=Never -- sh
If you don't see a command prompt, try pressing enter.
~ $ curl -o /dev/null -s -w "%{http_code}" https://github.com/
000
~ $ curl https://github.com/
curl: (6) Could not resolve host: github.com
~ $ nslookup github.com
Server:		169.254.25.10
Address:	169.254.25.10:53

Non-authoritative answer:

Non-authoritative answer:
Name:	github.com
Address: 140.82.121.4

the problem of the image out of scope in this task but I see issue with error handling in operator, because at the same time in operator I see:

github-checker-operator-controller-manager-5c6454d5b6-smtqn manager 2024-10-07T11:12:08Z	INFO	Status updated successfully	{"Checker.Name": "github"}
github-checker-operator-controller-manager-5c6454d5b6-smtqn manager 2024-10-07T11:12:23Z	INFO	Trigger reconcile:	{"req.NamespacedName": {"name":"github","namespace":"checker"}}
github-checker-operator-controller-manager-5c6454d5b6-smtqn manager 2024-10-07T11:12:23Z	INFO	ConfigMap already exists, skipping creation	{"ConfigMap.Name": "github"}
github-checker-operator-controller-manager-5c6454d5b6-smtqn manager 2024-10-07T11:12:23Z	INFO	ConfigMap updated successfully	{"ConfigMap.Name": "github"}
github-checker-operator-controller-manager-5c6454d5b6-smtqn manager 2024-10-07T11:12:23Z	INFO	CronJob already exists, skipping creation	{"CronJob.Name": "github"}
github-checker-operator-controller-manager-5c6454d5b6-smtqn manager 2024-10-07T11:12:23Z	INFO	CronJob updated successfully	{"CronJob.Name": "github"}
github-checker-operator-controller-manager-5c6454d5b6-smtqn manager 2024-10-07T11:12:23Z	INFO	Looking up logs from found latest pod	{"latestPod.Name": "github-28804992-lc4k9"}
github-checker-operator-controller-manager-5c6454d5b6-smtqn manager 2024-10-07T11:12:23Z	ERROR	Could not get logs from pod	{"error": "container \"curl\" in pod \"github-28804992-lc4k9\" is waiting to start: ContainerCreating"}
github-checker-operator-controller-manager-5c6454d5b6-smtqn manager github.com/cloudification-io/github-checker-operator/internal/controller.(*CheckerReconciler).UpdateStatus
github-checker-operator-controller-manager-5c6454d5b6-smtqn manager 	/workspace/internal/controller/utils.go:208
github-checker-operator-controller-manager-5c6454d5b6-smtqn manager github.com/cloudification-io/github-checker-operator/internal/controller.(*CheckerReconciler).Reconcile
github-checker-operator-controller-manager-5c6454d5b6-smtqn manager 	/workspace/internal/controller/checker_controller.go:73
github-checker-operator-controller-manager-5c6454d5b6-smtqn manager sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Reconcile
github-checker-operator-controller-manager-5c6454d5b6-smtqn manager 	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:116
github-checker-operator-controller-manager-5c6454d5b6-smtqn manager sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler
github-checker-operator-controller-manager-5c6454d5b6-smtqn manager 	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:303
github-checker-operator-controller-manager-5c6454d5b6-smtqn manager sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem
github-checker-operator-controller-manager-5c6454d5b6-smtqn manager 	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:263
github-checker-operator-controller-manager-5c6454d5b6-smtqn manager sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2.2
github-checker-operator-controller-manager-5c6454d5b6-smtqn manager 	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:224

and I cannot understand by these logs what is going on.

What I expected to see is correct information that domain was not resolved.

Any suggestions how we can improve that?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok @velp I see 2 things going on, lets start with easier one:

  1. ERROR Could not get logs from pod {"error": "container \"curl\" in pod \"github-28804992-lc4k9\" is waiting to start: ContainerCreating"} looks like I did not properly handle waiting for the pod to reach Succeded phase before fetching logs. I can easily patch that :D

  2. I suspect the issue with curl might be related to DNS or network configuration. I have a general question about your environment - are you running the operator behind some proxy or FW? I need to ask you to provide me some more info about that. Please, run those commands in the pod you have spawned before and paste the output here so I can inspect whats going on:

nc -vz github.com 443
nslookup github.com
cat /etc/resolv.conf

For me, it looks like there’s a DNS issue or the cluster has been deployed with something like NodeLocal DNSCache enabled, which could be causing the problem.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. It cannot handle the situation when the POD in crashloop or when there are 2 containers (one is crashing the second one is creating):
$ kgp -n checker
NAME                    READY   STATUS              RESTARTS   AGE
github-28806357-cxct2   0/1     Error               0          35s
github-28806357-gvnlv   0/1     ContainerCreating   0          9s
  1. results of commands bellow:
~ $ nc -vz github.com 443
nc: bad address 'github.com'
~ $ nslookup github.com
Server:		169.254.25.10
Address:	169.254.25.10:53

Non-authoritative answer:

Non-authoritative answer:
Name:	github.com
Address: 140.82.121.4

~ $ cat /etc/resolv.conf
search default.svc.qa.cloudification.io svc.qa.cloudification.io qa.cloudification.io
nameserver 169.254.25.10
options ndots:5

but as I said, debugging the DNS problem is out of scope of this task, my question was only about error handling. And even if operator can fetch the logs it will see there 000:

kl -n checker github-28806357-cxct2
log is DEPRECATED and will be removed in a future version. Use logs instead.
000

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok @velp so:

  1. issue with fetching logs from latest pod before it gets to Succeded phase has been fixed in 1c7051c

  2. whenever the operator fetches 000 from the pod it sets the Checker CRD status to Unknown which in my mind is better way to inspect status of the Checker:

λ k get pods
NAME                    READY   STATUS         RESTARTS   AGE
github-28806509-28zmv   0/1     ErrImagePull   0          58s
λ k get checker
NAME     TARGET STATUS
github   Unknown
λ k get pods
NAME                    READY   STATUS      RESTARTS   AGE
github-28806528-qgrrq   0/1     Completed   0          52s
λ k get checker
NAME     TARGET STATUS
github   200

Copy link
Collaborator

@velp velp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, thank you, for your time. Dmitry will contact you soon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants