Skip to content

Conversation

@nolanderc
Copy link
Contributor

@nolanderc nolanderc commented Mar 18, 2024

When adding a dependency using zig fetch --save I usually just want the latest thing on the master/main branch. However, when running zig fetch --save git+https://github/abc/def#master this gets saved verbatim in build.zig.zon:

.@"def" = .{
    .url = "git+https://github.com/abc/def#master",
    .hash = "...",
},

Now, when compiling this in the future, the branch may point to another commit, leading to a hash-mismatch.

Old Proposal

Context for the discussion below

Old Proposal

This PR adds the new flag --resolve-commit to zig fetch which instead saves whatever commit the refspec points at instead:

.@"def" = .{
    .url = "git+https://github.com/abc/def#123403399058fd83f7fbb597f3f8304007ff6a3c",
    .hash = "...",
},

Naming

I'm open to other names for this flag. Some alternatives I've come up with:

  • --save-commit
  • --latest

Open Questions

Are there any drawbacks to making this the default behavior and instead add a flag to restore the old behaviour?

New Proposal

This PR makes it so that zig fetch --save <URI>#<ref> now replaces the <ref> with the commit SHA in build.zig.zon. The old <ref> is moved to a query parameter ?ref=<ref> so that automated tooling still can check for changes upstream. See the discussion below for the rationale behind this decision. The example above would now become:

.@"def" = .{
    .url = "git+https://github.com/abc/def?ref=master#123403399058fd83f7fbb597f3f8304007ff6a3c",
    .hash = "...",
},

It is possible to restore the old behavior by adding the --preserve-url flag to the zig fetch invocation. This saves the URL verbatim to build.zig.zon.

@rohlem
Copy link
Contributor

rohlem commented Mar 19, 2024

I'm not too familiar with zig fetch, so I can't speak for the intended design, but here's my 2c idea/take:

Are there any drawbacks to making this the default behavior and instead add a flag to restore the old behaviour?

The current behavior informs the user when the remote was modified, which I think would be valid for use cases always wanting the newest commit (f.e. to be informed of bugfixes and safety/security patches upstream).

The naming --latest would sound ambiguous to me: --current-latest-commit fits the new behavior while --always-fetch-latest describes the old one.
Actually, I would propose the following idea as a most explicit interface:

  • Have flags for both behaviors,
    • Either named by user impact, i.e. --always-fetch-latest and --current-latest
    • or by technical background, --keep-refspec and --resolve-refspec (oid, refspec, commit id - not sure which naming is the most correct and understandable)
  • Always check whether the given URL explicitly refers to a fixed commit id (I assume we can do this in a standardized manner?)
    • If it doesn't (the commit referred to by this URL could change in the future), require the user to specify one of the flags to choose the behavior they want.

@Cloudef
Copy link
Contributor

Cloudef commented Mar 20, 2024

Considering how zig package manager works, I think this behavior should be the default. Update checking / security vuln checking should be a separate tool or pass imo.

@nolanderc
Copy link
Contributor Author

@Cloudef Considering how zig package manager works, I think this behavior should be the default.

I'm inclined to agree. The package manager expects that the contend pointed to by the url field matches the hash, and with a branch that is not necessarily the case as the commit a branch references could change. I'd expect most people don't want their builds to randomly break when upstream suddenly push a new commit.

@rohlem The current behavior informs the user when the remote was modified, which I think would be valid for use cases always wanting the newest commit (f.e. to be informed of bugfixes and safety/security patches upstream).

This may work fine in a top-level application, but what happens if this is in a library with transitive dependencies? Consider three projects A -> B -> C, where A depends on a fixed commit in B and B depends on a branch in C. In the case where C pushes an update to its branch, this would suddenly cause the build in A to break as it inherits B's dependencies, even though A used a fixed commit. To fix, this would require a patch to B, which either requires forking the project or waiting for an upstream patch to be accepted. Neither case is great.

@Cloudef Update checking / security vuln checking should be a separate tool or pass imo.

In order to support updating/scanning we would have to preserve the original name of the branch/tag in addition to the latest commit. For example, running the command zig fetch --save git+https://github/abc/def#master we would add the following to build.zig.zon:

.@"def" = .{
    .url = "git+https://github.com/abc/def#123403399058fd83f7fbb597f3f8304007ff6a3c",
    .refspec = "master",
    .hash = "...",
},

Where refspec (naming TDB) is the branch/tag all updates should be done against. One can then envision a command such as zig fetch --update def which updates the def to the latest commit pointed to by the master branch on the remote.

@Cloudef
Copy link
Contributor

Cloudef commented Mar 20, 2024

Since the git url already has special syntax, another option would be to use query parameters to encode more information for example.

Stores the original ref as a query parameter in the URL so that it is
possible to automatically check the upstream if there are any newer
commits.

Also adds a flag which opts-out of the new behaivour, restoring the old.
@nolanderc
Copy link
Contributor Author

Since the git url already has special syntax, another option would be to use query parameters to encode more information for example.

That's a nice solution! I updated the PR:

  • Commits are now resolved by default.
  • The original ref is stored as a query parameter ?ref=...
  • There's a new flag --preserve-url which keeps the original URL, restoring the old behavior.

@andrewrk
Copy link
Member

Nice work!

@andrewrk andrewrk merged commit 7fa2357 into ziglang:master May 10, 2024
andrewrk added a commit that referenced this pull request May 10, 2024
This reverts commit 7fa2357, reversing
changes made to cb77bd6.
@andrewrk
Copy link
Member

Reverted in 6ca4ed5. When I went to fix the compile errors, I noticed a bunch more problems that I didn't see the first time around and I will now do a post-hoc review.

Copy link
Member

@andrewrk andrewrk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is good, thanks for working on it.

The implementation needs to be updated to the latest master branch (merging it caused compile errors due to changes in the std.Uri API since this PR was opened).

Also there is one more thing to consider: URLs that use git tags. Those should probably be preserved by default. If the git tag is modified such that the package now has a different set of files, that should probably trigger a hash mismatch failure.

Comment on lines 6777 to +6778
var save: union(enum) { no, yes, name: []const u8 } = .no;
var preserve_url: bool = false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be incorporated into the type of save. I think a better set of flags would be --save-exact and --save-exact=[name].

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #19941

Comment on lines +6930 to +6940
if (fetch.latest_commit) |*latest_commit| {
var uri = try std.Uri.parse(path_or_url);
const target_ref = uri.fragment orelse "";
if (!std.mem.eql(u8, target_ref, latest_commit)) {
std.log.info("resolved ref '{s}' to commit {s}", .{
target_ref,
std.fmt.fmtSliceHexLower(latest_commit),
});

if (!preserve_url) {
if (target_ref.len != 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Too much nesting. Give the block a label and use break statements.

Don't use string length 0 to mean null.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #19941

Comment on lines +6942 to +6945
var query = try std.ArrayList(u8).initCapacity(arena, 4 + target_ref.len);
try std.Uri.writeEscapedQuery(query.writer(), "ref=");
try std.Uri.writeEscapedQuery(query.writer(), target_ref);
uri.query = try query.toOwnedSlice();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std.Uri.Component supports format() so you can instead do something like this:

std.fmt.allocPrint(arena, "ref={}", .{target_ref}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #19941

nolanderc added a commit to nolanderc/zig that referenced this pull request May 11, 2024
andrewrk pushed a commit that referenced this pull request May 11, 2024
* Revert "Revert "Merge pull request #19349 from nolanderc/save-commit""

This reverts commit 6ca4ed5.

* update to new URI changes, rework `--save` type

* initialize `latest_commit` to null everywhere
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants