Skip to content

Conversation

@easwars
Copy link
Contributor

@easwars easwars commented Oct 10, 2025

Existing behavior:

  • At routing time, when an RPC matches a route and a cluster is selected, the interceptor chain for that specific RPC is built.
  • This chain is built on a per-RPC basis.
  • A subsequent RPC that matches the exact same route and cluster will trigger the entire chain reconstruction again, even if no configuration has changed.

New behavior:

  • The interceptor chain is now pre-built for every route and every pickable cluster associated with that route.
  • The chains are constructed once when the config selector is built.

Other changes:

  • Existing unit tests have been converted to be more e2e style tests.
  • This lays the necessary groundwork for upcoming changes to the filter API, specifically to support filter state retention

RELEASE NOTES: NONE

@easwars easwars added Area: xDS Includes everything xDS related, including LB policies used with xDS. Type: Internal Cleanup Refactors, etc labels Oct 10, 2025
@easwars easwars added this to the 1.77 Release milestone Oct 10, 2025
@easwars
Copy link
Contributor Author

easwars commented Oct 10, 2025

@eshitachandwani : FYI this might conflict with your resolver changes for A74. And since you have been looking at the resolver code for sometime now, it would be a good PR to review.

@codecov
Copy link

codecov bot commented Oct 10, 2025

Codecov Report

❌ Patch coverage is 46.87500% with 17 lines in your changes missing coverage. Please review.
✅ Project coverage is 83.09%. Comparing base (8110884) to head (226e890).
⚠️ Report is 24 commits behind head on master.

Files with missing lines Patch % Lines
internal/xds/resolver/serviceconfig.go 45.00% 4 Missing and 7 partials ⚠️
internal/xds/resolver/xds_resolver.go 50.00% 3 Missing and 3 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #8641      +/-   ##
==========================================
+ Coverage   79.45%   83.09%   +3.64%     
==========================================
  Files         415      415              
  Lines       41339    32144    -9195     
==========================================
- Hits        32844    26710    -6134     
+ Misses       6621     4021    -2600     
+ Partials     1874     1413     -461     
Files with missing lines Coverage Δ
internal/xds/resolver/xds_resolver.go 73.64% <50.00%> (-8.13%) ⬇️
internal/xds/resolver/serviceconfig.go 65.03% <45.00%> (-20.53%) ⬇️

... and 365 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@dfawley dfawley requested review from arjan-bal and removed request for dfawley October 15, 2025 16:17
@dfawley dfawley assigned arjan-bal and unassigned dfawley Oct 15, 2025
@dfawley
Copy link
Member

dfawley commented Oct 15, 2025

@arjan-bal could you review this change, please?

Copy link
Contributor

@arjan-bal arjan-bal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving my comments on the non-test code, still reviewing the tests.

// The filter config override maps contain overrides from the route, cluster,
// and virtual host respectively. The cluster override has the highest priority,
// followed by the route override, and finally the virtual host override.
func newInterceptor(filters []xdsresource.HTTPFilter, cluster, route, virtualHost map[string]httpfilter.FilterConfig) (iresolver.ClientInterceptor, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: What do you think about adding an override suffix to cluster, route, and virtualHost? I find it can help with readability, but no worries if you prefer the current naming.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

cs := r.newConfigSelector()
cs, err := r.newConfigSelector()
if err != nil {
r.onResourceError(fmt.Errorf("xds: failed to create config selector: %v", err))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the function return early after calling onResourceError? I'm not sure what happens when a nil config selector is passed to sendNewServiceConfig.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should return early since onResourceError will call sendNewServiceConfig with an erroring config selector, so we should not call sendNewServiceConfig again with a nil config selector (which will set a default config selector in client conn)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for spotting this. Fixed it.

Looks like we don't have any tests for this case, but this is also a very very unlikey case. There are two cases when building a config selector can fail:

  • A filter is part of the configuration, but is not supported on the client side. This case would not pass xDS client validation. (I had a panic for this case initially, but changed it to return an error as part of this reivew).
  • The call to build the interceptor for the filter fails. This is also very unlikey because if the filter successfully parsed the configuration (during resource parsing in the xDS client), there is very little reason for it to fail when trying to build the interceptor.

I'll anyway file an issue to add a test for this edge case.

Comment on lines 346 to 347
// Should not happen if it passed xdsClient validation.
panic(fmt.Sprintf("filter %q does not support use in client", filter.Name))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about continuing to return an error here? The method signature already allows for it, so callers presumably have error handling in place. Panicking to crash the process feels like a worse alternative, especially if this is a recoverable error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At some point in the recent past, I got comfortable using panics for cases where we definitely don't expect something to happen, without some really bad programming error. This I believe is one such. But since an error is already being handled by the caller of this function, I'm ok to return an error here instead of panicking. Thanks.

fb.logger.Logf("BuildClientInterceptor called with config: %+v, override: %+v", config, override)

if config == nil {
panic("unexpected missing config")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion printing an error to fail the test may be preferred to panicking from a spawned goroutine. The situation seems similar to the style guide recommendation about only calling t.Fatalf from the main test function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

// by the backend, allowing tests to assert that the correct filter
// configuration was applied for each RPC.
type testHTTPFilterWithRPCMetadata struct {
logger logger
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Is there a benefit of using a logger interface here? If not, I think it's simpler to directory store a testing.T here .

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably an overkill, but I feel that it's better to do this instead of accepting a testing.T, since we can very easily start calling other methods on it like Error or Fatal if we have access to it.


func (*testHTTPFilterWithRPCMetadata) IsTerminal() bool { return false }

var _ httpfilter.ClientInterceptorBuilder = &testHTTPFilterWithRPCMetadata{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think we should add a comment here stating that ClientInterceptorBuilder is an optional interface for Filters to implement so this compile time check ensures the test filter implements it. In my opinion this would help readers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

}
var errStr string
if newStreamErr != nil {
errStr = newStreamErr.Error()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that we're converting a string to an error, assigning it to testFilterCfg.newStreamErr and converting it back to a string here. Can the error string be stored in testFilterCfg.newStreamErr to avoid these conversions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

}

func (cs *clientStream) Context() context.Context {
return cs.ctx
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to store a context here? Can we return cs.ClientStream.Context() instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like I don't need this type at all. I was probably doing something else at some point in time that required this type. But with the current state of the test and the current implementation of testFilterInterceptor, we don't need this type at all. Thanks.

Comment on lines +290 to +292
if internal.NewXDSResolverWithConfigForTesting == nil {
t.Fatalf("internal.NewXDSResolverWithConfigForTesting is nil")
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: We can avoid this check and let the code panic if we don't expect to fail in most cases.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a bunch of commented code here that probably needs to be removed. Can you please take a look?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanks.

@arjan-bal arjan-bal removed their assignment Oct 23, 2025
@arjan-bal arjan-bal modified the milestones: 1.77 Release, 1.78 Release Oct 30, 2025
@easwars easwars assigned arjan-bal and unassigned easwars Oct 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Area: xDS Includes everything xDS related, including LB policies used with xDS. Type: Internal Cleanup Refactors, etc

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants