Skip to content

Conversation

@alebedev87
Copy link
Contributor

@alebedev87 alebedev87 commented Oct 24, 2025

This PR introduces HTTPKeepAliveTimeout tuning option to the IngressController API, allowing customers to configure timeout http-keep-alive.

In OCP versions prior to 4.16, this timeout was not respected (see haproxy/haproxy#2334). This addition brings the ability to adjust the behavior to match pre-4.16 configurations.

Xref old RFE: https://issues.redhat.com/browse/RFE-1284.

@openshift-ci-robot openshift-ci-robot added the jira/severity-critical Referenced Jira bug's severity is critical for the branch this PR is targeting. label Oct 24, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 24, 2025

Hello @alebedev87! Some important instructions when contributing to openshift/api:
API design plays an important part in the user experience of OpenShift and as such API PRs are subject to a high level of scrutiny to ensure they follow our best practices. If you haven't already done so, please review the OpenShift API Conventions and ensure that your proposed changes are compliant. Following these conventions will help expedite the api review process for your PR.

@openshift-ci-robot openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Oct 24, 2025
@openshift-ci-robot
Copy link

@alebedev87: This pull request references Jira Issue OCPBUGS-61858, which is invalid:

  • expected the bug to target the "4.21.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

This PR introduces HTTPKeepAliveTimeout tuning option to the IngressController API, allowing customers to configure timeout http-keep-alive.

In OCP versions prior to 4.16, this timeout was not respected (see haproxy/haproxy#2334). This addition brings the ability to adjust the behavior to match pre-4.16 configurations.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@alebedev87
Copy link
Contributor Author

/jira refresh

@openshift-ci-robot openshift-ci-robot added the jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. label Oct 24, 2025
@openshift-ci-robot
Copy link

@alebedev87: This pull request references Jira Issue OCPBUGS-61858, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.21.0) matches configured target version for branch (4.21.0)
  • bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @ShudiLi

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot removed the jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. label Oct 24, 2025
@openshift-ci openshift-ci bot requested a review from ShudiLi October 24, 2025 12:12
Comment on lines 1887 to 1888
// httpKeepAliveTimeout defines the maximum allowed time to wait for
// a new HTTP request to appear.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This refers to waiting for a new HTTP request to appear on an idle connection that is being considered for closure, right? If that is the purpose, or one of the purposes of this timeout, it should be mentioned.

Copy link
Contributor Author

@alebedev87 alebedev87 Oct 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Closure of an idle connection is one of the purposes. There are quite some others mentioned in the haproxy docs. I didn't want to favorise one over another or list them all (as there are many). I think this message is consitent with other timeouts we have.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear where the wait is happening, so it would be better to be explicit here, or give examples.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rephrased it to:

// httpKeepAliveTimeout defines the maximum allowed time to wait for
// a new HTTP request to appear on a connection from the client to the router.

I hope this makes it clearer.

// fraction and a unit suffix, e.g. "300ms", "1.5h" or "2h45m".
// Valid time units are "ns", "us" (or "µs" U+00B5 or "μs" U+03BC), "ms", "s", "m", "h".
//
// When omitted, this means the user has no opinion and the platform is left
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think adding a section starting with "// Setting this field is generally not recommended..." is always helpful. We have it on most of the other tuning options to help people understand the consequence of changing the default value, with explanation for what happens if you set it too high (idle connections remain open longer and use unnecessary resources?), and what happens if you set it too low (idle connection could be closed sooner than wanted and interrupt traffic?).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"// Setting this field is generally not recommended..."

I cannot say that it's "not recommended". It's a prerogative of a customer, that's why we had an RFE fo it. I can elaborate on corner cases though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added 2 "paragraphs" about the potential impact of setting low and high values.

// to choose a reasonable default. This default is subject to change over time.
// The current default is 300s.
//
// +kubebuilder:validation:Pattern=^(0|([0-9]+(\.[0-9]+)?(ns|us|µs|μs|ms|s|m|h))+)$
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// +kubebuilder:validation:Format=duration

does this works instead? spotted it on other field above

Copy link
Contributor Author

@alebedev87 alebedev87 Oct 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, the duration validation is not aligned in the tuning options. I used the explicit regex for a reason that some kubebuilder's duration != golang's duration. We have a bug which showcases this for the client timeout.
However this made me think of the fact that I forgot to add tests for the new field, will do them.

// +kubebuilder:validation:Pattern=^(0|([0-9]+(\.[0-9]+)?(ns|us|µs|μs|ms|s|m|h))+)$
// +kubebuilder:validation:Type:=string
// +optional
HTTPKeepAliveTimeout *metav1.Duration `json:"httpKeepAliveTimeout,omitempty"`
Copy link
Member

@saschagrunert saschagrunert Oct 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note

We prefer not to use duration values anymore. Instead, we would create a int32 type, with units in the name. For example, this should be httpKeepAliveTimeoutSeconds.

Referring linter: kubernetes-sigs/kube-api-linter#24

We have a bunch of other *metav1.Duration types as part of this structure and I think we should keep them consistent with the new field.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a bunch of other *metav1.Duration types as part of this structure and I think we should keep them consistent with the new field.

I think this makes sense for new APIs (or new fields of existing APIs). But here I'm thinking of the consistency in the scope of the same API field. All other timeouts we have in IngressController.Spec.TuningOptions are metav1.Duration. Using httpKeepAliveTimeoutSeconds will break the existing pattern and harm the user experience. I acknowledge the new rule but I would like to stay consistent with other timeouts. Unless it's a hard requirement without which we won't get an approval from the API team.

@alebedev87 alebedev87 force-pushed the OCPBUGS-61858-http-keep-alive-timeout branch from 7a86cb6 to 61a5799 Compare October 27, 2025 16:39
@openshift-ci openshift-ci bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Oct 27, 2025
Comment on lines 1898 to 1905
// Low values (tens of milliseconds or less) can cause clients to close and reopen connections
// for each request, leading to excessive TCP or SSL handshakes.
// For HTTP/2, special care should be taken with low values.
// A few seconds is a reasonable starting point to avoid holding idle connections open
// while still allowing subsequent requests to reuse the connection.
//
// High values (more than a minute) can cause idle connections to linger,
// increasing exposure to long-lived but inactive connection attacks.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since our default is 300s, I don't think we can can bluntly say that values more than a minute increases exposure to attacks. If so, why don't we change the default now?

As a matter of fact, why not mention the use case from the bug? Also, unless we've tested every single value, we can't really be explicit about what is a high or low value.

Suggested change
// Low values (tens of milliseconds or less) can cause clients to close and reopen connections
// for each request, leading to excessive TCP or SSL handshakes.
// For HTTP/2, special care should be taken with low values.
// A few seconds is a reasonable starting point to avoid holding idle connections open
// while still allowing subsequent requests to reuse the connection.
//
// High values (more than a minute) can cause idle connections to linger,
// increasing exposure to long-lived but inactive connection attacks.
// Setting this value requires a careful consideration of the impact. Choosing a value too
// low can cause clients to close and reopen connections for every request, leading to reduced
// connection sharing. Choosing a value too high can cause idle connections to linger,
// increasing exposure to long-lived but inactive connection attacks.
//
// For HTTP/2, special care should be taken with low values.
// A few seconds is a reasonable starting point to avoid holding idle connections open
// while still allowing subsequent requests to reuse the connection.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since our default is 300s, I don't think we can can bluntly say that values more than a minute increases exposure to attacks.

I'm going to rephrase it to:

// High values (minutes or more) favor connection reuse but may cause idle
// connections to linger longer.

to highlight more the "reuse" aspect which we chose initially with 300s value.

If so, why don't we change the default now?

I believe that there is no perfect default value here. We favor the connection reuse by default. Some customers want this timeout to be even longer (like in https://issues.redhat.com/browse/RFE-1284), some prefer it to be smaller (like in https://issues.redhat.com//browse/OCPBUGS-61858). Changing the default can result in the behavior change as we saw in https://issues.redhat.com//browse/OCPBUGS-61858 where the bug was fixed and timeout http-keep-alive started to be respected leading to more concurrent connections.

As a matter of fact, why not mention the use case from the bug?

What use case from the bug you think can be mentioned here? Just for the context: https://issues.redhat.com//browse/OCPBUGS-61858 discovered a fix for timeout http-keep-alive which was done in HAProxy 2.8. In HAProxy 2.6 the timeout http-keep-alive was not respected and was falling back to shorter client timeout.

Also, unless we've tested every single value, we can't really be explicit about what is a high or low value.

I feel like we have to define what "low" and "high" values at least in order of magnitude. Otherwise this will always raise a follow-up question about what is "low" and what is "high".

HAProxy documentation is explicit about the values, I think we can safely use them:

In general "timeout http-keep-alive" is best used to prevent clients from
holding open an otherwise idle connection too long on sites seeing large
amounts of short connections. This can be accomplished by setting the value
to a few tens to hundreds of milliseconds in HTTP/1.1.

Another use case is the exact opposite: some sites want to permit clients
to reuse idle connections for a long time (e.g. 30 seconds to one minute)

A suggested low starting value for HTTP/2 connections would be around
4 seconds.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disregarding the word-smithing, if our default is still 5 minutes, then I think we're not addressing the bug.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bug will not be addressed by changing the default value because this will impact all the other customers who use HAProxy 2.8 right now. We will suggest the customer from https://issues.redhat.com//browse/OCPBUGS-61858 to set the timeout to the value they need.

This commit introduces `HTTPKeepAliveTimeout` tuning option to
the IngressController API, allowing customers to configure
`timeout http-keep-alive`.

In OCP versions prior to 4.16, this timeout was not respected
(see haproxy/haproxy#2334).
This addition brings the ability to adjust the behavior
to match pre-4.16 configurations.
@alebedev87 alebedev87 force-pushed the OCPBUGS-61858-http-keep-alive-timeout branch from 61a5799 to a47ad11 Compare October 28, 2025 10:15
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 28, 2025

@alebedev87: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@ShudiLi
Copy link
Member

ShudiLi commented Oct 29, 2025

Tested it with 4.21.0-0-2025-10-29-070835-test-ci-ln-g86vs5k-latest

1.
% oc get clusterversion
NAME      VERSION                                                AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.21.0-0-2025-10-29-070835-test-ci-ln-g86vs5k-latest   True        False         87m     Cluster version is 4.21.0-0-2025-10-29-070835-test-ci-ln-g86vs5k-latest

2. Configure the httpKeepAliveTimeout with different values, including the invalid 50d
% oc -n openshift-ingress-operator get ingresscontroller default -oyaml | yq ".spec.tuningOptions"
httpKeepAliveTimeout: 50ms
reloadInterval: 0s
% oc -n openshift-ingress-operator get ingresscontroller default -oyaml | yq ".spec.tuningOptions"
httpKeepAliveTimeout: 50m
reloadInterval: 0s
% oc -n openshift-ingress-operator get ingresscontroller default -oyaml | yq ".spec.tuningOptions"
httpKeepAliveTimeout: 50h
reloadInterval: 0s

# ingresscontrollers.operator.openshift.io "default" was not valid:
# * spec.tuningOptions.httpKeepAliveTimeout: Invalid value: "50d": spec.tuningOptions.httpKeepAliveTimeout in body should match '^(0|([0-9]+(\.[0-9]+)?(ns|us|µs|μs|ms|s|m|h))+)$'
#


3.  configure it with 50s
% oc -n openshift-ingress-operator get ingresscontroller default -oyaml | yq ".spec.tuningOptions"
httpKeepAliveTimeout: 50s
reloadInterval: 0s

% oc -n openshift-ingress rsh router-default-94887c8b8-5kk8w
sh-5.1$ env | grep -i alive
ROUTER_SLOWLORIS_HTTP_KEEPALIVE=50s
sh-5.1$
sh-5.1$

4. create pods, service and route for the function test
% oc -n test get route
NAME          HOST/PORT                                                                     PATH   SERVICES      PORT          TERMINATION   WILDCARD
unsec-apach   unsec-apach-test.apps.ci-ln-g86vs5k-72292.origin-ci-int-gce.dev.rhcloud.com          unsec-apach   unsec-apach                 None

5. Send traffic and check after about 50s the fin is received on the client side
sh-4.4# tcpdump -i any port 80 -s 0 -n -v
dropped privs to tcpdump
tcpdump: listening on any, link-type LINUX_SLL (Linux cooked v1), capture size 262144 bytes
09:13:45.977861 IP (tos 0x0, ttl 64, id 23269, offset 0, flags [DF], proto TCP (6), length 60)
    10.128.2.18.55864 > 136.113.209.231.http: Flags [S], cksum 0x6719 (incorrect -> 0xee2b), seq 166188194, win 64680, options [mss 1320,sackOK,TS val 385033927 ecr 0,nop,wscale 7], length 0
09:13:45.981353 IP (tos 0x0, ttl 62, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    136.113.209.231.http > 10.128.2.18.55864: Flags [S.], cksum 0xa864 (correct), seq 3504820758, ack 166188195, win 65400, options [mss 1320,sackOK,TS val 4180289213 ecr 385033927,nop,wscale 7], length 0
09:13:45.981407 IP (tos 0x0, ttl 64, id 23270, offset 0, flags [DF], proto TCP (6), length 52)
    10.128.2.18.55864 > 136.113.209.231.http: Flags [.], cksum 0x6711 (incorrect -> 0xd41f), ack 1, win 506, options [nop,nop,TS val 385033931 ecr 4180289213], length 0
09:13:45.981594 IP (tos 0x0, ttl 64, id 23271, offset 0, flags [DF], proto TCP (6), length 159)
    10.128.2.18.55864 > 136.113.209.231.http: Flags [P.], cksum 0x677c (incorrect -> 0x66fa), seq 1:108, ack 1, win 506, options [nop,nop,TS val 385033931 ecr 4180289213], length 107: HTTP, length: 107
	GET /a1.txt  HTTP/1.1
	Host:unsec-apach-test.apps.ci-ln-g86vs5k-72292.origin-ci-int-gce.dev.rhcloud.com
	
09:13:45.986643 IP (tos 0x0, ttl 62, id 54542, offset 0, flags [DF], proto TCP (6), length 477)
    136.113.209.231.http > 10.128.2.18.55864: Flags [P.], cksum 0xc2f2 (correct), seq 1:426, ack 108, win 511, options [nop,nop,TS val 4180289218 ecr 385033931], length 425: HTTP, length: 425
	HTTP/1.1 200 OK
	date: Wed, 29 Oct 2025 09:13:45 GMT
	server: Apache/2.4.37 (centos) OpenSSL/1.1.1k
	upgrade: h2c
	connection: Upgrade
	last-modified: Wed, 29 Oct 2025 09:05:38 GMT
	etag: "c-642486fa9ac83"
	accept-ranges: bytes
	content-length: 12
	content-type: text/plain; charset=UTF-8
	set-cookie: fe27bc5d3b8db7d9308afdc84b692496=aba716f055ad746da27df0e3d2408bfa; path=/; HttpOnly
	cache-control: private
	
	aaa111 text
09:13:45.986681 IP (tos 0x0, ttl 64, id 23272, offset 0, flags [DF], proto TCP (6), length 52)
    10.128.2.18.55864 > 136.113.209.231.http: Flags [.], cksum 0x6711 (incorrect -> 0xd204), ack 426, win 503, options [nop,nop,TS val 385033936 ecr 4180289218], length 0
09:14:35.989919 IP (tos 0x0, ttl 62, id 54543, offset 0, flags [DF], proto TCP (6), length 52)
    136.113.209.231.http > 10.128.2.18.55864: Flags [F.], cksum 0x0ea9 (correct), seq 426, ack 108, win 511, options [nop,nop,TS val 4180339220 ecr 385033936], length 0
09:14:36.030205 IP (tos 0x0, ttl 64, id 23273, offset 0, flags [DF], proto TCP (6), length 52)
    10.128.2.18.55864 > 136.113.209.231.http: Flags [.], cksum 0x6711 (incorrect -> 0x4b34), ack 427, win 503, options [nop,nop,TS val 385083980 ecr 4180339220], length 0
09:14:45.959906 IP (tos 0x0, ttl 64, id 23274, offset 0, flags [DF], proto TCP (6), length 158)
    10.128.2.18.55864 > 136.113.209.231.http: Flags [P.], cksum 0x677b (incorrect -> 0x1a03), seq 108:214, ack 427, win 503, options [nop,nop,TS val 385093909 ecr 4180339220], length 106: HTTP, length: 106
	GET /a2.txt HTTP/1.1
	Host:unsec-apach-test.apps.ci-ln-g86vs5k-72292.origin-ci-int-gce.dev.rhcloud.com
	
09:14:45.961091 IP (tos 0x0, ttl 62, id 0, offset 0, flags [DF], proto TCP (6), length 40)
    136.113.209.231.http > 10.128.2.18.55864: Flags [R], cksum 0x49c4 (correct), seq 3504821185, win 0, length 0

@ShudiLi
Copy link
Member

ShudiLi commented Oct 29, 2025

/label qe-approved
thanks

@openshift-ci openshift-ci bot added the qe-approved Signifies that QE has signed off on this PR label Oct 29, 2025
@openshift-ci-robot
Copy link

@alebedev87: This pull request references Jira Issue OCPBUGS-61858, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.21.0) matches configured target version for branch (4.21.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @ShudiLi

In response to this:

This PR introduces HTTPKeepAliveTimeout tuning option to the IngressController API, allowing customers to configure timeout http-keep-alive.

In OCP versions prior to 4.16, this timeout was not respected (see haproxy/haproxy#2334). This addition brings the ability to adjust the behavior to match pre-4.16 configurations.

Xref old RFE: https://issues.redhat.com/browse/RFE-1284.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Member

@saschagrunert saschagrunert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM from an API shadow reviewer perspective.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 29, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: saschagrunert
Once this PR has been reviewed and has the lgtm label, please assign joelspeed for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/severity-critical Referenced Jira bug's severity is critical for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. qe-approved Signifies that QE has signed off on this PR size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants