Skip to content

Allow configuration of the MutatingWebhook failure policy #2711

@sidewinder12s

Description

@sidewinder12s

Describe the bug
I ran into issues with TLS certs being regenerated due to these bugs:

#2312
#2264

Once the TLS certs changed, the MutatingWebhook for PodReadinessGate started failing and blocking the rollout of pods on services using this feature.

This was the error:

Error creating: Internal error occurred: failed calling webhook "mpod.elbv2.k8s.aws": Post "https://aws-lb-controller-webhook-service.kube-system.svc:443/mutate-v1-pod?timeout=10s": x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "aws-load-balancer-controller-ca")

I think this exposes an availability concern because if all the pods backing a service get rescheduled while the mutatingwebhook is broken, the service will go down. My understanding is the PodReadinessGate is a bonus feature to make rollouts more smooth in Kubernetes and I think it'd be preferable for the feature to just not work rather than block rollouts all together.

Steps to reproduce

Break TLS certs on the LB controller while using PodReadinessGates, then reschedule pods backing an LB in that namespace.

Expected outcome
I'd like to either be able to configure the webhooks failure policy or set it to fail open.

Environment

  • AWS Load Balancer controller version: 2.4.1
  • Kubernetes version: 1.21
  • Using EKS (yes/no), if so version? Yes, platform version 7

Additional Context:

Metadata

Metadata

Labels

good first issueDenotes an issue ready for a new contributor, according to the "help wanted" guidelines.kind/featureCategorizes issue or PR as related to a new feature.lifecycle/staleDenotes an issue or PR has remained open with no activity and has become stale.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions