-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Description
On August 14, 2025, Ted Unangst posted a blog post “what is the go proxy even doing?,” questioning the amount of traffic he has observed on his server humungus.tedunangst.com from proxy.golang.org. This issue tracks work to understand and fix the reported traffic.
Action Items
- disable refresh traffic for humungus
- add a proxy.golang.org FAQ entry for where to send reports of excessive traffic
- look for newer example of “thundering herd” / “flood” mentioned in blog post
- cmd/go: implement Mercurial support for go mod download -reuse, go list -reuse #75119
- proxy.golang.org: revise background refresh pacing #75191
Go module mirror background
The Go module mirror caches modules for the Go ecosystem and is used by default in the go command, for increased availability and to reduce load on individual servers. To reduce latency as well as improve availability, the module mirror refreshes its cached information on regular basis as long as users have requested that cached information recently.
Relevant details about the module mirror’s cache operation and refresh policies include:
- Module zip files containing recognized LICENSE files that allow redistribution are cached indefinitely and never refreshed.
- Module zip files without recognized LICENSE files allowing redistribution are only held in cache for 30 days and then expire. (These module versions can be recognized by having no recognized license and no displayed docs on pkg.go.dev.) If a user request arrives for a zip file not in cache, the zip file is obtained on demand, cached, and then served back; the user sees this as high latency, as well as reduced availability if the upstream server is not available at that moment. To improve both latency and availability, the module mirror aims to refresh these module zip files once they are 25 days old, but only if they have been accessed recently: module versions that stop being accessed stop being refreshed.
- The module mirror also caches version lists, as displayed by “go list -m -version <module>” and also used by “go get -u”, “govulncheck”, and other tools. Cached version lists expire after 30 minutes, and the module mirror refreshes them when they reach 25 minutes old, but again only if they have been accessed recently: module lists that stop being accessed stop being refreshed. The much smaller expiry is to reduce the amount of time required for “go get -u” to see the latest version.
- The module mirror also caches the result of branch and latest queries, used by “go get module@main” and by “go get module@latest” for modules with no tagged versions. These cached results also expire after 30 minutes and are refreshed at 25, again only if they have been accessed recently: module queries that stop being accessed stop being refreshed. And again, the much smaller expiry is to reduce the amount of time that users see stale information.
- For scaling purposes as well as failure containment, all the individual pieces of cached module information are stored and refreshed as independent units. That is, each module version zip file is handled separately, as is the version list and the result of each version query.
- The module mirror does not store cached copies of entire upstream version control repositories. It only caches the output of “go mod download” and “go list -json -m --versions”. The repository cache is internal to the go command, and the module mirror does not break through that abstraction. This does mean that each refresh must re-download what it needs from the upstream repository, but the implementation limits the size of that information in a few key ways, especially when using Git.
- When using Git, a fetch of a specific module version uses “git clone --depth=1”, to avoid downloading any repo history.
- When using Git, a fetch of the version lists does need repo history, to understand where the go.mod files are in the repository and when they existed.
- When using Git, both of those fetches are further optimized by saving a hash of a subset of the information available from “git ls-remote”. Refreshes run “git ls-remote” and recompute that hash. If it matches the cached hash, then the cached information is still up-to-date and can have its expiry extended, without downloading any extra data from the repository. The “go mod download -reuse” and “go list -reuse” flags implement this lightweight check, but only for Git.
- When using Mercurial, these refreshes do full repository clones, because Mercurial does not provide something like “git ls-remote” (“hg identify” has “--tags” and “--branches” flags, but these are disabled for remote repositories.)
- When using Subversion, refreshes are cheap since Subversion never downloads full repository history.
- When using Fossil and Bazaar, refreshes must download full repository information.
- In the past, we have disabled refresh entirely for domains that reported receiving too much refresh traffic. Their modules are still cached, but refreshes only happen on demand, during actual requests when the cached information is determined to be too old. For example discussion on proxy.golang.org: Unusual traffic to git hosting service from Go #44577 (before we implemented the lightweight Git refreshes), led to adding git.lubar.me and git.sr.ht to the no-refresh list.
- The module mirror keeps logs related to module fetches and refreshes for only a small number of weeks, for privacy and operational reasons.
Initial Investigation
The blog post reported traffic sent to humungus.tedunangst.com. Unfortunately, the post on 2025-08-14 is reporting traffic from 2025-05-19, three months earlier. We no longer have logs from then, making it hard to piece together the exact root causes for the observed requests.
One important detail is that humungus sends an HTTP 429 (StatusTooManyRequests) in response to “hg clone” of a given repo from a Google IP address, unless it has been 24 hours since the last clone of that repo. I assume the server has done this since around 2025-03-04, when this code was committed.
Traffic to humungus has never been reported to us as problematic, except for the blog post. That is perhaps our fault, as there is not an obvious answer to where to send such a report. Others have used this issue tracker, but we could add a FAQ entry to proxy.golang.org offering that as an explicit option, perhaps with an email option for people who do not want to post publicly. In any event, humungus was not on the no-refresh list mentioned in the previous section. After preliminary investigation, we added humungus to the no-refresh list on August 21, but we continued to investigate root causes.
Our investigation has showed the following:
- Some modules on humungus have LICENSE files, while others do not. For example gozstd has a LICENSE, meaning it displays docs and does not use module zip file refreshes, while webs does not have a LICENSE, meaning it does not display docs and does use module zip refreshes.
- Humungus hosts 32 repos (not all Go modules) totaling 12 MB. The webs repo has 131 tags; miniwebproxy has 11; and the others all have fewer than 10.
- For 24 hours on 2025-08-19, our logs show 904 total module-related fetches to humungus: 756 list fetches and 148 version fetches. Of these, 133 failed with I/O timeouts while fetching the HTML redirect page that would contain the hg server information, 762 were rejected during hg clone, and 9 succeeded. The total size of the rejected clones (size unpacked on disk, possibly smaller on the wire) would have been about 380 MB, or an average 40kbit/s over the course of the day. That’s not a humongous amount of bandwidth but still more than we’d like to cause a small site operator.
- In the 12 hours after we added humungus to the no-refresh list, our logs show 21 module-related fetches: 7 list fetches and 14 version fetches. Of these, 11 had rejected “hg clone” operations. This confirms that the refresh traffic is the problem.
- Mercurial support for -reuse would essentially eliminate all refresh clones and is not as impossible as we originally believed. Issue cmd/go: implement Mercurial support for go mod download -reuse, go list -reuse #75119 tracks implementing it.
Update, 2025-08-25
See comment below. Added link to #75191 in Action Items above.