-
Notifications
You must be signed in to change notification settings - Fork 711
Description
Description
The test always fails locally for me.
At this point, I believe it never worked.
The issue is that the auth server is configured over https, while the registry is on plain http.
Then, since the CA is not trusted, requests to the auth server will all fail.
level=error msg="failed to call tryLoginWithRegHost" error="failed to call rh.Authorizer.Authorize: failed to fetch oauth token: Post \"https://10.4.0.1:5001/auth\": tls: failed to verify certificate: x509: certificate signed by unknown authority" i=0
This was referenced in passing in #2607
If the registry used was over https, the CA gets appended to the tlsconfig roots, and likely reused when contacting the auth server.
However, on a retry during the same overall run, the test apparently passes on the CI.
I am not completely sure why at this point - my gut is that some other test (using the HTTPS registry) are not isolated properly from this one, and somehow the CA is still trusted afterwards?
Anyhow, this is clearly one of these cases where retrying a test is actually a bad idea.
We need to fix this test which seems to be doing nothing right now.
There is also a large amount of duplication in testregistry_linux.go which may be responsible for part of the confusion here.
I am also interested in better understanding why we apparently can't specify a hosts.toml for the auth server (from a cursory look, that might be somewhere in containerd/remotes/docker/resolver).
Appreciate that a token server is not a registry, but then, how do we specify a CA for these then (beside the obvious, adding to the system store, which is a non-starter)?
Maybe the use-case is nonsensical anyhow... (http registry + https auth server, or both over https using different third-party CAs...)
Finally, I do believe that retrying tests until they work is a bad idea - if a test fails, it should be looked into - if it's flaky, it should be fixed - retrying just ignores the underlying problem and makes things harder to understand all while creating weird side-effects and giving us a false confidence that tests are actually doing something while they very well might not. I do appreciate that flakyness is a PITA, but I do not see a better solution for it than to get to the bottom of it one by one...
Steps to reproduce the issue
Run integration tests (specifically cmd/nerdctl/login_linux_test.go)
Describe the results you received and expected
Fails every time.
What version of nerdctl are you using?
1.7.6
Are you using a variant of nerdctl? (e.g., Rancher Desktop)
None
Host information
No response