xds/resolver: Optimize Interceptor Chain Construction #8641

easwars · 2025-10-10T23:07:45Z

Existing behavior:

At routing time, when an RPC matches a route and a cluster is selected, the interceptor chain for that specific RPC is built.
This chain is built on a per-RPC basis.
A subsequent RPC that matches the exact same route and cluster will trigger the entire chain reconstruction again, even if no configuration has changed.

New behavior:

The interceptor chain is now pre-built for every route and every pickable cluster associated with that route.
The chains are constructed once when the config selector is built.

Other changes:

Existing unit tests have been converted to be more e2e style tests.
This lays the necessary groundwork for upcoming changes to the filter API, specifically to support filter state retention

RELEASE NOTES: NONE

…tor, not for every RPC

easwars · 2025-10-10T23:09:21Z

@eshitachandwani : FYI this might conflict with your resolver changes for A74. And since you have been looking at the resolver code for sometime now, it would be a good PR to review.

codecov · 2025-10-10T23:11:35Z

Codecov Report

❌ Patch coverage is 46.87500% with 17 lines in your changes missing coverage. Please review.
✅ Project coverage is 83.09%. Comparing base (8110884) to head (226e890).
⚠️ Report is 24 commits behind head on master.

Files with missing lines	Patch %	Lines
internal/xds/resolver/serviceconfig.go	45.00%	4 Missing and 7 partials ⚠️
internal/xds/resolver/xds_resolver.go	50.00%	3 Missing and 3 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #8641      +/-   ##
==========================================
+ Coverage   79.45%   83.09%   +3.64%     
==========================================
  Files         415      415              
  Lines       41339    32144    -9195     
==========================================
- Hits        32844    26710    -6134     
+ Misses       6621     4021    -2600     
+ Partials     1874     1413     -461

Files with missing lines	Coverage Δ
internal/xds/resolver/xds_resolver.go	`73.64% <50.00%> (-8.13%)`	⬇️
internal/xds/resolver/serviceconfig.go	`65.03% <45.00%> (-20.53%)`	⬇️

... and 365 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

dfawley · 2025-10-15T16:18:05Z

@arjan-bal could you review this change, please?

arjan-bal

Leaving my comments on the non-test code, still reviewing the tests.

arjan-bal · 2025-10-23T15:39:10Z

internal/xds/resolver/serviceconfig.go

+// The filter config override maps contain overrides from the route, cluster,
+// and virtual host respectively. The cluster override has the highest priority,
+// followed by the route override, and finally the virtual host override.
+func newInterceptor(filters []xdsresource.HTTPFilter, cluster, route, virtualHost map[string]httpfilter.FilterConfig) (iresolver.ClientInterceptor, error) {


nit: What do you think about adding an override suffix to cluster, route, and virtualHost? I find it can help with readability, but no worries if you prefer the current naming.

arjan-bal · 2025-10-23T15:51:58Z

internal/xds/resolver/xds_resolver.go

-	cs := r.newConfigSelector()
+	cs, err := r.newConfigSelector()
+	if err != nil {
+		r.onResourceError(fmt.Errorf("xds: failed to create config selector: %v", err))


Should the function return early after calling onResourceError? I'm not sure what happens when a nil config selector is passed to sendNewServiceConfig.

I think we should return early since onResourceError will call sendNewServiceConfig with an erroring config selector, so we should not call sendNewServiceConfig again with a nil config selector (which will set a default config selector in client conn)

Thanks for spotting this. Fixed it.

Looks like we don't have any tests for this case, but this is also a very very unlikey case. There are two cases when building a config selector can fail:

A filter is part of the configuration, but is not supported on the client side. This case would not pass xDS client validation. (I had a panic for this case initially, but changed it to return an error as part of this reivew).

The call to build the interceptor for the filter fails. This is also very unlikey because if the filter successfully parsed the configuration (during resource parsing in the xDS client), there is very little reason for it to fail when trying to build the interceptor.

I'll anyway file an issue to add a test for this edge case.

arjan-bal · 2025-10-23T17:11:49Z

internal/xds/resolver/serviceconfig.go

+			// Should not happen if it passed xdsClient validation.
+			panic(fmt.Sprintf("filter %q does not support use in client", filter.Name))


What do you think about continuing to return an error here? The method signature already allows for it, so callers presumably have error handling in place. Panicking to crash the process feels like a worse alternative, especially if this is a recoverable error.

At some point in the recent past, I got comfortable using panics for cases where we definitely don't expect something to happen, without some really bad programming error. This I believe is one such. But since an error is already being handled by the caller of this function, I'm ok to return an error here instead of panicking. Thanks.

arjan-bal · 2025-10-23T17:15:03Z