-
Couldn't load subscription status.
- Fork 2.7k
Description
(Apologies if I've filed this in the wrong place; happy to move it wherever it makes sense.)
There's been some interest lately in ensuring that code uploaded to crates.io is the same as the code in the repository on GitHub.
Aside on why this is worth doing
Most people who are interested in a crate's code will look at GitHub (or similar) repository, not the code uploaded to crates.io (citation needed, but I know I do this and I assume most others do too). This means that the code that's actually running is looked at by comparatively fewer people, and authors of malicious crates can make their vulnerabilities less likely to be discovered. This has happened in practice with the event-stream NPM package. By comparing the published crate to the GitHub source, this ensures that any malicious code must be visible when people go looking for it.
This doesn't need to be done by cargo, of course, but Cargo's current method of generating crate files makes it difficult for any tool to do this.
Ideally, .crate files would be bit-for-bit reproducible. If that were the case, this would be as simple as downloading the .crate file, cloning the source, running cargo package, and comparing hashes. #8864 made it most of the way there, but it fails in practice (with at least the crates I tested, the latest versions of hyper and rand), because the Cargo.lock files in the uploaded crate differ from the newly generated one. The crates follow the official guidance to omit the file (because they're libraries), so my Cargo generates a new one on the fly, including any new versions of dependencies since the crate was uploaded. Therefore, there's a mismatch.
I see a few solutions here:
- Use a more sophisticated method to do the comparisons, ignoring Cargo.lock. This would work, but I since this is a security-related feature, more complexity is more attack surface, and malicious code could potentially be hidden either by a bug in the comparison algorithm or if it's somehow possible to do something malicious with just a Cargo.lock file (maybe if it isn't actually TOML, but contains rust code or a binary?)
- Change the guidance to suggest checking in Cargo.lock files, even for libraries. I think some libraries are doing this anyway, to avoid dependency updates breaking their CI at arbitrary times? But this would be a major departure from current guidance for a relatively small gain, and it would likely take a long time for maintainers to adapt.
- Omit the Cargo.lock from
.cratefiles for crates that only contain libraries. I assume this file is never actually read unless the crate in question was passed directly tocargo install? If so, this seems like the best way forward.