-
Notifications
You must be signed in to change notification settings - Fork 13.9k
Do not GC the current active incremental session directory #147821
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
r? @nnethercote rustbot has assigned @nnethercote. Use |
|
|
||
| if directory_name.as_str() == current_session_directory_name { | ||
| // Skip our own session's directory: we know it's not garbage | ||
| // because we're using it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment explain why it's valid to skip the directory, but not why we would want to. You wrote a nice PR description, is it possible to expand this comment a little? The bit about the new session directory possible being 10s of seconds old seems like a key part of it.
An alternative would be to say "see #147821 for details". Maybe both would be good.
76cddb9 to
210d4ad
Compare
|
Thanks. @bors r+ rollup |
This comment has been minimized.
This comment has been minimized.
210d4ad to
b52d232
Compare
|
just want to say it's very cool that tidy caught me misspelling |
This comment has been minimized.
This comment has been minimized.
In `setup_dep_graph`, we set up a session directory for the current
incremental compilation session, load the dep graph, and then GC stale
incremental compilation sessions for the crate. The freshly-created
session directory ends up in this list of potentially-GC'd directories
but in practice is not typically even considered for GC because the new
directory is neither finalized nor `is_old_enough_to_be_collected`.
Unfortunately, `is_old_enough_to_be_collected` is a simple time check,
and if `load_dep_graph` is slow enough it's possible for the
freshly-created session directory to be tens of seconds old already.
Then, old enough to be *eligible* to GC, we try to `flock::Lock` it as
proof it is not owned by anyone else, and so is a stale working
directory.
Because we hold the lock in the same process, the behavior of
`flock::Lock` is dependent on platform-specifics about file locking
APIs. `fcntl(F_SETLK)`-style locks used on non-Linux Unices do not
provide mutual exclusion internal to a process. `fcntl_locking(2)` on
Linux describes some relevant problems:
```
The record locks described above are associated with the process
(unlike the open file description locks described below). This
has some unfortunate consequences:
* If a process closes any file descriptor referring to a file,
then all of the process's locks on that file are released, [...]
* The threads in a process share locks. In other words, a
multithreaded program can't use record locking to ensure that
threads don't simultaneously access the same region of a file.
```
`fcntl`-locks will appear to succeed to lock the fresh incremental
compilation directory, at which point we can remove it just before using
it later for incremental compilation. Saving incremental compilation
state later fails and takes rustc with it with an error like
```
[..]/target/debug/incremental/crate-<hash>/<name>/dep-graph.part.bin: No such file or directory (os error 2)
```
The release-lock-on-close behavior has uncomfortable consequences for
the freshly-opened file description for the lock, but I think in
practice isn't an issue. If we would close the file, we failed to
acquire the lock, so someone else had the lock ad we're not releasing
locks prematurely.
`flock(LOCK_EX)` doesn't seem to have these same issues, and because
`flock::Lock::new` always opens a new file description when locking, I
don't think Linux can have this issue.
From reading `LockFileEx` on MSDN I *think* Windows has locking
semantics similar to `flock`, but I haven't tested there at all.
My conclusion is that there is no way to write a pure-POSIX
`flock::Lock::new` which guarantees mutual exclusion across different
file descriptions of the same file in the same process, and
`flock::Lock::new` must not be used for that purpose. So, instead, avoid
considering the current incremental session directory for GC in the
first place. Our own `sess` is evidence we're alive and using it.
b52d232 to
4e816d8
Compare
|
@bors r+ rollup |
Rollup of 3 pull requests Successful merges: - #146167 (Deny-by-default never type lints) - #147382 (unused_must_use: Don't warn on `Result<(), Uninhabited>` or `ControlFlow<Uninhabited, ()>`) - #147821 (Do not GC the current active incremental session directory) r? `@ghost` `@rustbot` modify labels: rollup
Rollup merge of #147821 - iximeow:ixi/session-gc-vs-flock, r=nnethercote Do not GC the current active incremental session directory when building a relatively large repo (https://github.com/oxidecomputer/omicron) on illumos under heavy CPU pressure, i saw some rustc invocations die like: ``` [..]/target/debug/incremental/<crate>-<hash>/<name>/dep-graph.part.bin: No such file or directory (os error 2) ``` a bit of debugging later and it seems that if the system is very slow, Unix-flavored `flock::Lock::new()` doesn't quite get the mutual exclusion `garbage_collect_session_directories` expects. before this patch i could reproduce this with the crate `nexus_db_queries` (in that repo) by pinning the full `cargo build` to one core and having a busy loop fighting on that same core. with this patch i cannot reproduce the issue. i took a look at how `flock::Lock` is used and i think this is the only problematic use, so i figure i'll propose this change particularly since i don't think file locking can be made.. good... for Unix in general. ------ In `setup_dep_graph`, we set up a session directory for the current incremental compilation session, load the dep graph, and then GC stale incremental compilation sessions for the crate. The freshly-created session directory ends up in this list of potentially-GC'd directories but in practice is not typically even considered for GC because the new directory is neither finalized nor `is_old_enough_to_be_collected`. Unfortunately, `is_old_enough_to_be_collected` is a simple time check, and if `load_dep_graph` is slow enough it's possible for the freshly-created session directory to be tens of seconds old already. Then, old enough to be *eligible* to GC, we try to `flock::Lock` it as proof it is not owned by anyone else, and so is a stale working directory. Because we hold the lock in the same process, the behavior of `flock::Lock` is dependent on platform-specifics about file locking APIs. `fcntl(F_SETLK)`-style locks used on non-Linux Unices do not provide mutual exclusion internal to a process. `fcntl_locking(2)` on Linux describes some relevant problems: ``` The record locks described above are associated with the process (unlike the open file description locks described below). This has some unfortunate consequences: * If a process closes any file descriptor referring to a file, then all of the process's locks on that file are released, [...] * The threads in a process share locks. In other words, a multithreaded program can't use record locking to ensure that threads don't simultaneously access the same region of a file. ``` `fcntl`-locks will appear to succeed to lock the fresh incremental compilation directory, at which point we can remove it just before using it later for incremental compilation. Saving incremental compilation state later fails and takes rustc with it with an error like ``` [..]/target/debug/incremental/crate-<hash>/<name>/dep-graph.part.bin: No such file or directory (os error 2) ``` The release-lock-on-close behavior has uncomfortable consequences for the freshly-opened file description for the lock, but I think in practice isn't an issue. If we would close the file, we failed to acquire the lock, so someone else had the lock ad we're not releasing locks prematurely. `flock(LOCK_EX)` doesn't seem to have these same issues, and because `flock::Lock::new` always opens a new file description when locking, I don't think Linux can have this issue. From reading `LockFileEx` on MSDN I *think* Windows has locking semantics similar to `flock`, but I haven't tested there at all. My conclusion is that there is no way to write a pure-POSIX `flock::Lock::new` which guarantees mutual exclusion across different file descriptions of the same file in the same process, and `flock::Lock::new` must not be used for that purpose. So, instead, avoid considering the current incremental session directory for GC in the first place. Our own `sess` is evidence we're alive and using it.
when building a relatively large repo (https://github.com/oxidecomputer/omicron) on illumos under heavy CPU pressure, i saw some rustc invocations die like:
a bit of debugging later and it seems that if the system is very slow, Unix-flavored
flock::Lock::new()doesn't quite get the mutual exclusiongarbage_collect_session_directoriesexpects. before this patch i could reproduce this with the cratenexus_db_queries(in that repo) by pinning the fullcargo buildto one core and having a busy loop fighting on that same core. with this patch i cannot reproduce the issue. i took a look at howflock::Lockis used and i think this is the only problematic use, so i figure i'll propose this change particularly since i don't think file locking can be made.. good... for Unix in general.In
setup_dep_graph, we set up a session directory for the current incremental compilation session, load the dep graph, and then GC stale incremental compilation sessions for the crate. The freshly-created session directory ends up in this list of potentially-GC'd directories but in practice is not typically even considered for GC because the new directory is neither finalized noris_old_enough_to_be_collected.Unfortunately,
is_old_enough_to_be_collectedis a simple time check, and ifload_dep_graphis slow enough it's possible for the freshly-created session directory to be tens of seconds old already. Then, old enough to be eligible to GC, we try toflock::Lockit as proof it is not owned by anyone else, and so is a stale working directory.Because we hold the lock in the same process, the behavior of
flock::Lockis dependent on platform-specifics about file locking APIs.fcntl(F_SETLK)-style locks used on non-Linux Unices do not provide mutual exclusion internal to a process.fcntl_locking(2)on Linux describes some relevant problems:fcntl-locks will appear to succeed to lock the fresh incremental compilation directory, at which point we can remove it just before using it later for incremental compilation. Saving incremental compilation state later fails and takes rustc with it with an error likeThe release-lock-on-close behavior has uncomfortable consequences for the freshly-opened file description for the lock, but I think in practice isn't an issue. If we would close the file, we failed to acquire the lock, so someone else had the lock ad we're not releasing locks prematurely.
flock(LOCK_EX)doesn't seem to have these same issues, and becauseflock::Lock::newalways opens a new file description when locking, I don't think Linux can have this issue.From reading
LockFileExon MSDN I think Windows has locking semantics similar toflock, but I haven't tested there at all.My conclusion is that there is no way to write a pure-POSIX
flock::Lock::newwhich guarantees mutual exclusion across different file descriptions of the same file in the same process, andflock::Lock::newmust not be used for that purpose. So, instead, avoid considering the current incremental session directory for GC in the first place. Our ownsessis evidence we're alive and using it.