Do not GC the current active incremental session directory #147821

iximeow · 2025-10-17T18:03:59Z

when building a relatively large repo (https://github.com/oxidecomputer/omicron) on illumos under heavy CPU pressure, i saw some rustc invocations die like:

[..]/target/debug/incremental/<crate>-<hash>/<name>/dep-graph.part.bin: No such file or directory (os error 2)

a bit of debugging later and it seems that if the system is very slow, Unix-flavored flock::Lock::new() doesn't quite get the mutual exclusion garbage_collect_session_directories expects. before this patch i could reproduce this with the crate nexus_db_queries (in that repo) by pinning the full cargo build to one core and having a busy loop fighting on that same core. with this patch i cannot reproduce the issue. i took a look at how flock::Lock is used and i think this is the only problematic use, so i figure i'll propose this change particularly since i don't think file locking can be made.. good... for Unix in general.

In setup_dep_graph, we set up a session directory for the current incremental compilation session, load the dep graph, and then GC stale incremental compilation sessions for the crate. The freshly-created session directory ends up in this list of potentially-GC'd directories but in practice is not typically even considered for GC because the new directory is neither finalized nor is_old_enough_to_be_collected.

Unfortunately, is_old_enough_to_be_collected is a simple time check, and if load_dep_graph is slow enough it's possible for the freshly-created session directory to be tens of seconds old already. Then, old enough to be eligible to GC, we try to flock::Lock it as proof it is not owned by anyone else, and so is a stale working directory.

Because we hold the lock in the same process, the behavior of flock::Lock is dependent on platform-specifics about file locking APIs. fcntl(F_SETLK)-style locks used on non-Linux Unices do not provide mutual exclusion internal to a process. fcntl_locking(2) on Linux describes some relevant problems:

       The record locks described above are associated with the process
       (unlike the open file description locks described below).  This
       has some unfortunate consequences:

       *  If a process closes any file descriptor referring to a file,
          then all of the process's locks on that file are released, [...]

       *  The threads in a process share locks.  In other words, a
          multithreaded program can't use record locking to ensure that
          threads don't simultaneously access the same region of a file.

fcntl-locks will appear to succeed to lock the fresh incremental compilation directory, at which point we can remove it just before using it later for incremental compilation. Saving incremental compilation state later fails and takes rustc with it with an error like

[..]/target/debug/incremental/crate-<hash>/<name>/dep-graph.part.bin: No such file or directory (os error 2)

The release-lock-on-close behavior has uncomfortable consequences for the freshly-opened file description for the lock, but I think in practice isn't an issue. If we would close the file, we failed to acquire the lock, so someone else had the lock ad we're not releasing locks prematurely.

flock(LOCK_EX) doesn't seem to have these same issues, and because flock::Lock::new always opens a new file description when locking, I don't think Linux can have this issue.

From reading LockFileEx on MSDN I think Windows has locking semantics similar to flock, but I haven't tested there at all.

My conclusion is that there is no way to write a pure-POSIX flock::Lock::new which guarantees mutual exclusion across different file descriptions of the same file in the same process, and flock::Lock::new must not be used for that purpose. So, instead, avoid considering the current incremental session directory for GC in the first place. Our own sess is evidence we're alive and using it.

rustbot · 2025-10-17T18:04:05Z

r? @nnethercote

rustbot has assigned @nnethercote.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

compiler/rustc_incremental/src/persist/fs.rs

nnethercote · 2025-10-20T00:35:44Z

compiler/rustc_incremental/src/persist/fs.rs


+            if directory_name.as_str() == current_session_directory_name {
+                // Skip our own session's directory: we know it's not garbage
+                // because we're using it.


This comment explain why it's valid to skip the directory, but not why we would want to. You wrote a nice PR description, is it possible to expand this comment a little? The bit about the new session directory possible being 10s of seconds old seems like a key part of it.

An alternative would be to say "see #147821 for details". Maybe both would be good.

nnethercote · 2025-10-20T02:17:46Z

Thanks.

@bors r+ rollup

bors · 2025-10-20T02:17:48Z

📌 Commit 210d4ad has been approved by nnethercote

It is now in the queue for this repository.

iximeow · 2025-10-20T02:30:27Z

just want to say it's very cool that tidy caught me misspelling session, sorry about that. (./x.py gets Solaris artifacts for illumos and then ./x.py tidy goes very sideways, and i haven't gotten to fixing that yet, so i'd skipped tidy Just This One Time...)

In `setup_dep_graph`, we set up a session directory for the current incremental compilation session, load the dep graph, and then GC stale incremental compilation sessions for the crate. The freshly-created session directory ends up in this list of potentially-GC'd directories but in practice is not typically even considered for GC because the new directory is neither finalized nor `is_old_enough_to_be_collected`. Unfortunately, `is_old_enough_to_be_collected` is a simple time check, and if `load_dep_graph` is slow enough it's possible for the freshly-created session directory to be tens of seconds old already. Then, old enough to be *eligible* to GC, we try to `flock::Lock` it as proof it is not owned by anyone else, and so is a stale working directory. Because we hold the lock in the same process, the behavior of `flock::Lock` is dependent on platform-specifics about file locking APIs. `fcntl(F_SETLK)`-style locks used on non-Linux Unices do not provide mutual exclusion internal to a process. `fcntl_locking(2)` on Linux describes some relevant problems: ``` The record locks described above are associated with the process (unlike the open file description locks described below). This has some unfortunate consequences: * If a process closes any file descriptor referring to a file, then all of the process's locks on that file are released, [...] * The threads in a process share locks. In other words, a multithreaded program can't use record locking to ensure that threads don't simultaneously access the same region of a file. ``` `fcntl`-locks will appear to succeed to lock the fresh incremental compilation directory, at which point we can remove it just before using it later for incremental compilation. Saving incremental compilation state later fails and takes rustc with it with an error like ``` [..]/target/debug/incremental/crate-<hash>/<name>/dep-graph.part.bin: No such file or directory (os error 2) ``` The release-lock-on-close behavior has uncomfortable consequences for the freshly-opened file description for the lock, but I think in practice isn't an issue. If we would close the file, we failed to acquire the lock, so someone else had the lock ad we're not releasing locks prematurely. `flock(LOCK_EX)` doesn't seem to have these same issues, and because `flock::Lock::new` always opens a new file description when locking, I don't think Linux can have this issue. From reading `LockFileEx` on MSDN I *think* Windows has locking semantics similar to `flock`, but I haven't tested there at all. My conclusion is that there is no way to write a pure-POSIX `flock::Lock::new` which guarantees mutual exclusion across different file descriptions of the same file in the same process, and `flock::Lock::new` must not be used for that purpose. So, instead, avoid considering the current incremental session directory for GC in the first place. Our own `sess` is evidence we're alive and using it.

nnethercote · 2025-10-20T03:34:14Z

@bors r+ rollup

bors · 2025-10-20T03:34:16Z

📌 Commit 4e816d8 has been approved by nnethercote

It is now in the queue for this repository.

Rollup of 3 pull requests Successful merges: - #146167 (Deny-by-default never type lints) - #147382 (unused_must_use: Don't warn on `Result<(), Uninhabited>` or `ControlFlow<Uninhabited, ()>`) - #147821 (Do not GC the current active incremental session directory) r? `@ghost` `@rustbot` modify labels: rollup

Rollup merge of #147821 - iximeow:ixi/session-gc-vs-flock, r=nnethercote Do not GC the current active incremental session directory when building a relatively large repo (https://github.com/oxidecomputer/omicron) on illumos under heavy CPU pressure, i saw some rustc invocations die like: ``` [..]/target/debug/incremental/<crate>-<hash>/<name>/dep-graph.part.bin: No such file or directory (os error 2) ``` a bit of debugging later and it seems that if the system is very slow, Unix-flavored `flock::Lock::new()` doesn't quite get the mutual exclusion `garbage_collect_session_directories` expects. before this patch i could reproduce this with the crate `nexus_db_queries` (in that repo) by pinning the full `cargo build` to one core and having a busy loop fighting on that same core. with this patch i cannot reproduce the issue. i took a look at how `flock::Lock` is used and i think this is the only problematic use, so i figure i'll propose this change particularly since i don't think file locking can be made.. good... for Unix in general. ------ In `setup_dep_graph`, we set up a session directory for the current incremental compilation session, load the dep graph, and then GC stale incremental compilation sessions for the crate. The freshly-created session directory ends up in this list of potentially-GC'd directories but in practice is not typically even considered for GC because the new directory is neither finalized nor `is_old_enough_to_be_collected`. Unfortunately, `is_old_enough_to_be_collected` is a simple time check, and if `load_dep_graph` is slow enough it's possible for the freshly-created session directory to be tens of seconds old already. Then, old enough to be *eligible* to GC, we try to `flock::Lock` it as proof it is not owned by anyone else, and so is a stale working directory. Because we hold the lock in the same process, the behavior of `flock::Lock` is dependent on platform-specifics about file locking APIs. `fcntl(F_SETLK)`-style locks used on non-Linux Unices do not provide mutual exclusion internal to a process. `fcntl_locking(2)` on Linux describes some relevant problems: ``` The record locks described above are associated with the process (unlike the open file description locks described below). This has some unfortunate consequences: * If a process closes any file descriptor referring to a file, then all of the process's locks on that file are released, [...] * The threads in a process share locks. In other words, a multithreaded program can't use record locking to ensure that threads don't simultaneously access the same region of a file. ``` `fcntl`-locks will appear to succeed to lock the fresh incremental compilation directory, at which point we can remove it just before using it later for incremental compilation. Saving incremental compilation state later fails and takes rustc with it with an error like ``` [..]/target/debug/incremental/crate-<hash>/<name>/dep-graph.part.bin: No such file or directory (os error 2) ``` The release-lock-on-close behavior has uncomfortable consequences for the freshly-opened file description for the lock, but I think in practice isn't an issue. If we would close the file, we failed to acquire the lock, so someone else had the lock ad we're not releasing locks prematurely. `flock(LOCK_EX)` doesn't seem to have these same issues, and because `flock::Lock::new` always opens a new file description when locking, I don't think Linux can have this issue. From reading `LockFileEx` on MSDN I *think* Windows has locking semantics similar to `flock`, but I haven't tested there at all. My conclusion is that there is no way to write a pure-POSIX `flock::Lock::new` which guarantees mutual exclusion across different file descriptions of the same file in the same process, and `flock::Lock::new` must not be used for that purpose. So, instead, avoid considering the current incremental session directory for GC in the first place. Our own `sess` is evidence we're alive and using it.

rustbot added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Oct 17, 2025

rustbot assigned nnethercote Oct 17, 2025

rustbot added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Oct 17, 2025

nnethercote reviewed Oct 20, 2025

View reviewed changes

compiler/rustc_incremental/src/persist/fs.rs Outdated Show resolved Hide resolved

nnethercote reviewed Oct 20, 2025

View reviewed changes

iximeow force-pushed the ixi/session-gc-vs-flock branch 2 times, most recently from 76cddb9 to 210d4ad Compare October 20, 2025 02:16

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Oct 20, 2025

This comment has been minimized.

Sign in to view

iximeow force-pushed the ixi/session-gc-vs-flock branch from 210d4ad to b52d232 Compare October 20, 2025 02:28

This comment has been minimized.

Sign in to view

iximeow force-pushed the ixi/session-gc-vs-flock branch from b52d232 to 4e816d8 Compare October 20, 2025 02:40

Zalathar mentioned this pull request Oct 20, 2025

Rollup of 3 pull requests #147900

Merged

bors merged commit 95279f9 into rust-lang:master Oct 20, 2025
11 checks passed

rustbot added this to the 1.92.0 milestone Oct 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Do not GC the current active incremental session directory #147821

Do not GC the current active incremental session directory #147821

iximeow commented Oct 17, 2025

Uh oh!

rustbot commented Oct 17, 2025

Uh oh!

Uh oh!

nnethercote Oct 20, 2025

Uh oh!

nnethercote commented Oct 20, 2025

Uh oh!

bors commented Oct 20, 2025

Uh oh!

This comment has been minimized.

iximeow commented Oct 20, 2025

Uh oh!

This comment has been minimized.

nnethercote commented Oct 20, 2025

Uh oh!

bors commented Oct 20, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Do not GC the current active incremental session directory #147821

Do not GC the current active incremental session directory #147821

Conversation

iximeow commented Oct 17, 2025

Uh oh!

rustbot commented Oct 17, 2025

Uh oh!

Uh oh!

nnethercote Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

nnethercote commented Oct 20, 2025

Uh oh!

bors commented Oct 20, 2025

Uh oh!

This comment has been minimized.

iximeow commented Oct 20, 2025

Uh oh!

This comment has been minimized.

nnethercote commented Oct 20, 2025

Uh oh!

bors commented Oct 20, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants