guestagent: k8s: use kubectl instead of client-go #4120

AkihiroSuda · 2025-10-01T10:37:15Z

Part of:

guestagent binaries are too huge (k8s.io deps should be replaced with exec("kubectl")?) #3237

TODO: drop dependency on k8s.io/api

Merge after:

CI: test Kubernetes port forwarder #4118

AkihiroSuda · 2025-10-01T10:50:15Z

Before (964fb30)

$ du -hs _output/
128M    _output/

$ ls -lh _output/bin/limactl _output/share/lima/lima-guestagent.Linux-*
-rwxr-xr-x@ 1 suda  staff    28M Oct  1 19:47 _output/bin/limactl*
-rw-r--r--@ 1 suda  staff    14M Oct  1 19:47 _output/share/lima/lima-guestagent.Linux-aarch64.gz
-rw-r--r--@ 1 suda  staff    15M Oct  1 19:47 _output/share/lima/lima-guestagent.Linux-armv7l.gz
-rw-r--r--@ 1 suda  staff    14M Oct  1 19:47 _output/share/lima/lima-guestagent.Linux-ppc64le.gz
-rw-r--r--@ 1 suda  staff    15M Oct  1 19:47 _output/share/lima/lima-guestagent.Linux-riscv64.gz
-rw-r--r--@ 1 suda  staff    16M Oct  1 19:47 _output/share/lima/lima-guestagent.Linux-s390x.gz
-rw-r--r--@ 1 suda  staff    16M Oct  1 19:47 _output/share/lima/lima-guestagent.Linux-x86_64.gz

After (f74fbdf)

$ du -hs _output/
 88M    _output/

$ ls -lh _output/bin/limactl _output/share/lima/lima-guestagent.Linux-*
-rwxr-xr-x@ 1 suda  staff    28M Oct  1 19:49 _output/bin/limactl*
-rw-r--r--@ 1 suda  staff   8.4M Oct  1 19:49 _output/share/lima/lima-guestagent.Linux-aarch64.gz
-rw-r--r--@ 1 suda  staff   8.8M Oct  1 19:49 _output/share/lima/lima-guestagent.Linux-armv7l.gz
-rw-r--r--@ 1 suda  staff   8.4M Oct  1 19:49 _output/share/lima/lima-guestagent.Linux-ppc64le.gz
-rw-r--r--@ 1 suda  staff   8.8M Oct  1 19:49 _output/share/lima/lima-guestagent.Linux-riscv64.gz
-rw-r--r--@ 1 suda  staff   9.2M Oct  1 19:49 _output/share/lima/lima-guestagent.Linux-s390x.gz
-rw-r--r--@ 1 suda  staff   9.3M Oct  1 19:49 _output/share/lima/lima-guestagent.Linux-x86_64.gz

jandubois · 2025-10-01T19:03:18Z

I have not looked at this PR at all yet, but wanted to mention a couple of things I discussed with @Nino-K as requirements for his port monitoring PR:

Keep retrying to connect to k8s indefinitely, as it will not be running yet by the time the guest agent starts.
When the connection breaks, keep trying to reconnect with a short delay indefinitely, as the user may have stopped and restarted k8s.

For this PR also: the kubectl binary may not yet be available on the PATH when you try to invoke it, keep trying. It may also fail because the port may be open, but the apiserver not yet responding, or the kubeconfig may be missing etc. The retry on broken connections should handle this automatically.

Because of the indefinite retries, the kubernetes watcher should be opt-in (configurable in lima.yaml), so it only runs when the VM is known to run Kubernetes.

AkihiroSuda · 2025-10-03T06:55:04Z

The retry on broken connections should handle this automatically.

Yes, this is retried.

Because of the indefinite retries, the kubernetes watcher should be opt-in

It wasn't opt-in so far, and I don't think it has to be so, as the overhead of polling LookPath("kubectl") seems trivial.

The resource constraint should be set by the caller via `$LIMACTL_CREATE_ARGS`. Signed-off-by: Akihiro Suda <[email protected]>

Signed-off-by: Akihiro Suda <[email protected]>

Part of issue 3237 TODO: drop dependency on k8s.io/api Signed-off-by: Akihiro Suda <[email protected]>

nirs

Trimming the guest agent is nice, anything using client-go becomes huge quickly. But did you measure memory and cpu usage before and after this change?

With the new code we always keep kubectl watch command running, which has the similar cpu usage to what we had before in the guest agent, but now we format the json events and parse them back in the guest agent, and keep all service in memory twice, once is kubectl (using the informer) and once in the guest agent.

nirs · 2025-10-04T17:14:44Z

hack/test-templates.sh

+	set -x
+	limactl shell "$NAME" kubectl get nodes -o wide
+	limactl shell "$NAME" kubectl create deployment nginx --image="${nginx_image}"
+	limactl shell "$NAME" kubectl create service nodeport nginx --node-port=31080 --tcp=80:80


Why not use yaml file with the deployment and service?

nirs · 2025-10-04T17:16:49Z

hack/test-templates.sh

+	limactl shell "$NAME" kubectl get nodes -o wide
+	limactl shell "$NAME" kubectl create deployment nginx --image="${nginx_image}"
+	limactl shell "$NAME" kubectl create service nodeport nginx --node-port=31080 --tcp=80:80
+	timeout 3m bash -euxc "until curl -f --retry 30 --retry-connrefused http://127.0.0.1:31080; do sleep 3; done"


Checking the connection makes sense only after the deployment is available. I would do this:

kubectl apply -f nginx.yaml kubectl rollout status deployment nginx --timeout 60s

At this point the service is typically not available, but it should be in few seconds, so we can start checking every second.

Since we wait separately for the deployment, we don't need to wait 3 minutes for the connection, maybe 30 seconds ie enough to detect broken port forwarding.

Notes for the curl command:

Using --fail will be more readable

Adding --silent will avoid unhelpful noise in the test logs

nirs · 2025-10-04T17:32:56Z