Skip to content

NGF 2.1.1 bug when SnippetsFilter configuration is invalid #3959

@fabriziofiorucci

Description

@fabriziofiorucci

Describe the bug
NGF 2.1.1 control plane breaks when an HTTPRoute references a SnippetsFilter with an invalid configuration. The only apparent way to sort it out seems to be undeploying and redeploying NGF again. I reproduced this using the Helm installation on a Kubernetes 1.30.6 cluster.

To Reproduce

  1. Deploy NGINX using https://github.com/f5devcentral/NGINX-Gateway-Fabric-Lab/blob/main/DEPLOYING.md The actual Helm command should enable snippetsFilters, so I used:
helm install ngf oci://ghcr.io/nginx/charts/nginx-gateway-fabric \
  --set nginx.image.repository=private-registry.nginx.com/nginx-gateway-fabric/nginx-plus \
  --set nginx.image.tag=2.1.1 \
  --set nginx.plus=true \
  --set serviceAccount.imagePullSecret=nginx-plus-registry-secret \
  --set nginx.imagePullSecret=nginx-plus-registry-secret \
  --set nginx.usage.secretName=nplus-license \
  --set nginx.service.type=NodePort \
  --set nginxGateway.snippetsFilters.enable=true \
  -n nginx-gateway
  1. Apply the attached files 1.phpapp.yaml 2.gateway.yaml 3.snippetsfilter.yaml and 4.httproute.yaml. The SnippetsFilter manifest contains an invalid configuration like fastcgi_pass invalid-fqdn:9000;

  2. After applying all manifests, we have

$ kubectl apply -f .
configmap/phpinfo created
deployment.apps/php-fpm created
service/php-fpm created
gateway.gateway.networking.k8s.io/gateway created
snippetsfilter.gateway.nginx.org/fastcgi created
httproute.gateway.networking.k8s.io/php-fpm created

$ kubectl get pods
NAME                           READY   STATUS    RESTARTS   AGE
gateway-nginx-d6b4c56c-dk9x7   0/1     Running   0          6s
php-fpm-8b8b4cbdf-vzjf7        1/1     Running   0          7s

$ kubectl get gateway
NAME      CLASS   ADDRESS         PROGRAMMED   AGE
gateway   nginx   10.102.15.248   False        3m11s
  1. The NGF control plane pod logs:
{"level":"info","ts":"2025-09-25T10:29:29Z","logger":"provisioner","msg":"Creating/Updating nginx resources","namespace":"default","name":"gateway-nginx"}
{"level":"info","ts":"2025-09-25T10:29:29Z","logger":"eventHandler","msg":"NGINX configuration was successfully updated"}
{"level":"info","ts":"2025-09-25T10:29:29Z","logger":"provisioner","msg":"Creating/Updating nginx resources","namespace":"default","name":"gateway-nginx"}
{"level":"info","ts":"2025-09-25T10:29:29Z","logger":"provisioner","msg":"Creating/Updating nginx resources","namespace":"default","name":"gateway-nginx"}
{"level":"info","ts":"2025-09-25T10:29:29Z","logger":"provisioner","msg":"Creating/Updating nginx resources","namespace":"default","name":"gateway-nginx"}
{"level":"info","ts":"2025-09-25T10:29:29Z","logger":"provisioner","msg":"Creating/Updating nginx resources","namespace":"default","name":"gateway-nginx"}
{"level":"info","ts":"2025-09-25T10:29:29Z","logger":"provisioner","msg":"Creating/Updating nginx resources","namespace":"default","name":"gateway-nginx"}
{"level":"info","ts":"2025-09-25T10:29:30Z","logger":"provisioner","msg":"Creating/Updating nginx resources","namespace":"default","name":"gateway-nginx"}
{"level":"info","ts":"2025-09-25T10:29:30Z","logger":"eventHandler","msg":"NGINX configuration was successfully updated"}
{"level":"info","ts":"2025-09-25T10:29:30Z","logger":"eventHandler","msg":"NGINX configuration was successfully updated"}
{"level":"info","ts":"2025-09-25T10:29:30Z","logger":"eventHandler","msg":"NGINX configuration was successfully updated"}
{"level":"info","ts":"2025-09-25T10:29:31Z","logger":"eventHandler","msg":"NGINX configuration was successfully updated"}
{"level":"info","ts":"2025-09-25T10:29:36Z","logger":"nginxUpdater.commandService","msg":"Creating connection for nginx pod: gateway-nginx-d6b4c56c-dk9x7"}
{"level":"info","ts":"2025-09-25T10:29:37Z","logger":"nginxUpdater.commandService","msg":"Successfully connected to nginx agent gateway-nginx-d6b4c56c-dk9x7"}
{"level":"error","ts":"2025-09-25T10:29:42Z","logger":"nginxUpdater.commandService","msg":"error sending request to agent","error":"msg: Config apply failed, rolling back config; error: failed validating config NGINX config test failed exit status 1: 2025/09/25 10:29:37 [emerg] 34#34: host not found in upstream \"invalid-fqdn\" in /etc/nginx/includes/SnippetsFilter_http.server.location_default_fastcgi.conf:1\nnginx: [emerg] host not found in upstream \"invalid-fqdn\" in /etc/nginx/includes/SnippetsFilter_http.server.location_default_fastcgi.conf:1\nnginx: configuration file /etc/nginx/nginx.conf test failed\n\nmsg: ; error: failed to preform API action, NGINX Plus API is not configured\nmsg: Config apply failed, rollback successful; error: failed validating config NGINX config test failed exit status 1: 2025/09/25 10:29:37 [emerg] 34#34: host not found in upstream \"invalid-fqdn\" in /etc/nginx/includes/SnippetsFilter_http.server.location_default_fastcgi.conf:1\nnginx: [emerg] host not found in upstream \"invalid-fqdn\" in /etc/nginx/includes/SnippetsFilter_http.server.location_default_fastcgi.conf:1\nnginx: configuration file /etc/nginx/nginx.conf test failed\n\nmsg: ; error: failed to preform API action, NGINX Plus API is not configured\nmsg: ; error: failed to preform API action, NGINX Plus API is not configured\nmsg: ; error: failed to preform API action, NGINX Plus API is not configured\nmsg: ; error: failed to preform API action, NGINX Plus API is not configured\nmsg: ; error: failed to preform API action, NGINX Plus API is not configured\nmsg: ; error: failed to preform API action, NGINX Plus API is not configured\nmsg: ; error: failed to preform API action, NGINX Plus API is not configured\nmsg: ; error: failed to preform API action, NGINX Plus API is not configured\nmsg: ; error: failed to preform API action, NGINX Plus API is not configured","stacktrace":"github.com/nginx/nginx-gateway-fabric/v2/internal/controller/nginx/agent.(*commandService).logAndSendErrorStatus\n\t/home/runner/work/nginx-gateway-fabric/nginx-gateway-fabric/internal/controller/nginx/agent/command.go:365\ngithub.202132.xyz/nginx/nginx-gateway-fabric/v2/internal/controller/nginx/agent.(*commandService).setInitialConfig\n\t/home/runner/work/nginx-gateway-fabric/nginx-gateway-fabric/internal/controller/nginx/agent/command.go:322\ngithub.202132.xyz/nginx/nginx-gateway-fabric/v2/internal/controller/nginx/agent.(*commandService).Subscribe\n\t/home/runner/work/nginx-gateway-fabric/nginx-gateway-fabric/internal/controller/nginx/agent/command.go:149\ngithub.202132.xyz/nginx/agent/v3/api/grpc/mpi/v1._CommandService_Subscribe_Handler\n\tpkg/mod/github.com/nginx/agent/[email protected]/api/grpc/mpi/v1/command_grpc.pb.go:233\ngithub.202132.xyz/nginx/nginx-gateway-fabric/v2/internal/controller/nginx/agent/grpc/interceptor.(*ContextSetter).Stream.ContextSetter.Stream.func1\n\t/home/runner/work/nginx-gateway-fabric/nginx-gateway-fabric/internal/controller/nginx/agent/grpc/interceptor/interceptor.go:65\ngoogle.golang.org/grpc.(*Server).processStreamingRPC\n\tpkg/mod/google.golang.org/[email protected]/server.go:1728\ngoogle.golang.org/grpc.(*Server).handleStream\n\tpkg/mod/google.golang.org/[email protected]/server.go:1845\ngoogle.golang.org/grpc.(*Server).serveStreams.func2.1\n\tpkg/mod/google.golang.org/[email protected]/server.go:1061"}
{"level":"error","ts":"2025-09-25T10:29:42Z","logger":"eventHandler","msg":"Failed to update NGINX configuration","error":"msg: Config apply failed, rolling back config; error: failed validating config NGINX config test failed exit status 1: 2025/09/25 10:29:37 [emerg] 34#34: host not found in upstream \"invalid-fqdn\" in /etc/nginx/includes/SnippetsFilter_http.server.location_default_fastcgi.conf:1\nnginx: [emerg] host not found in upstream \"invalid-fqdn\" in /etc/nginx/includes/SnippetsFilter_http.server.location_default_fastcgi.conf:1\nnginx: configuration file /etc/nginx/nginx.conf test failed\n\nmsg: ; error: failed to preform API action, NGINX Plus API is not configured\nmsg: Config apply failed, rollback successful; error: failed validating config NGINX config test failed exit status 1: 2025/09/25 10:29:37 [emerg] 34#34: host not found in upstream \"invalid-fqdn\" in /etc/nginx/includes/SnippetsFilter_http.server.location_default_fastcgi.conf:1\nnginx: [emerg] host not found in upstream \"invalid-fqdn\" in /etc/nginx/includes/SnippetsFilter_http.server.location_default_fastcgi.conf:1\nnginx: configuration file /etc/nginx/nginx.conf test failed\n\nmsg: ; error: failed to preform API action, NGINX Plus API is not configured\nmsg: ; error: failed to preform API action, NGINX Plus API is not configured\nmsg: ; error: failed to preform API action, NGINX Plus API is not configured\nmsg: ; error: failed to preform API action, NGINX Plus API is not configured\nmsg: ; error: failed to preform API action, NGINX Plus API is not configured\nmsg: ; error: failed to preform API action, NGINX Plus API is not configured\nmsg: ; error: failed to preform API action, NGINX Plus API is not configured\nmsg: ; error: failed to preform API action, NGINX Plus API is not configured\nmsg: ; error: failed to preform API action, NGINX Plus API is not configured","stacktrace":"github.com/nginx/nginx-gateway-fabric/v2/internal/controller.(*eventHandlerImpl).waitForStatusUpdates\n\t/home/runner/work/nginx-gateway-fabric/nginx-gateway-fabric/internal/controller/handler.go:262"}
  1. Remove the application and NGF resources:
$ kubectl delete -f .
configmap "phpinfo" deleted from default namespace
deployment.apps "php-fpm" deleted from default namespace
service "php-fpm" deleted from default namespace
gateway.gateway.networking.k8s.io "gateway" deleted from default namespace
snippetsfilter.gateway.nginx.org "fastcgi" deleted from default namespace
httproute.gateway.networking.k8s.io "php-fpm" deleted from default namespace
  1. Deploying a working application with a working set of NGF manifests such as https://github.com/f5devcentral/NGINX-Gateway-Fabric-Lab/tree/main/labs/1.basic-app shows that the NGF control plane pod is stuck. Gateway objects are not provisioned correctly and apparently the only way out is to undeploy NGF through its Helm chart and redeploy it again
NAME      CLASS   ADDRESS   PROGRAMMED   AGE
gateway   nginx             Unknown      31s

$ kubectl describe gateway gateway
Name:         gateway
Namespace:    default
Labels:       <none>
Annotations:  <none>
API Version:  gateway.networking.k8s.io/v1
Kind:         Gateway
Metadata:
  Creation Timestamp:  2025-09-25T10:34:43Z
  Generation:          1
  Resource Version:    64123979
  UID:                 2993bd59-0647-45a9-9aa2-abbb0b6fb197
Spec:
  Gateway Class Name:  nginx
  Listeners:
    Allowed Routes:
      Namespaces:
        From:  Same
    Hostname:  *.example.com
    Name:      http
    Port:      80
    Protocol:  HTTP
Status:
  Conditions:
    Last Transition Time:  1970-01-01T00:00:00Z
    Message:               Waiting for controller
    Reason:                Pending
    Status:                Unknown
    Type:                  Accepted
    Last Transition Time:  1970-01-01T00:00:00Z
    Message:               Waiting for controller
    Reason:                Pending
    Status:                Unknown
    Type:                  Programmed
Events:                    <none>

Expected behavior

Applying SnippetsFilter resources with wrong configurations shouldn't break the NGF control plane pod.

Your environment

  • NGF 2.1.1
  • "Vanilla" Kubernetes 1.30.6 running on Ubuntu on qemu
  • Tested (with the same outcome) exposing NGF in NodePort and LoadBalancer mode

1.phpapp.yaml
2.gateway.yaml
3.snippetsfilter.yaml
4.httproute.yaml

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

Status

✅ Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions