Skip to content

Boot nginx from the backend app to fix graceful shutdown in production #3417

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Mar 18, 2021

Conversation

jtgeibel
Copy link
Member

In production we aren't gracefully shutting down as expected. nginx
does a quick shutdown when receiving a SIGTERM and once it exits, it
appears that our process is aborted.

Instead of spawning the backend from the shell script, the backend
process is now responsible for spawning the shell script. This allows
the backend to finish its shutdown work even if the script has already
exited.

cat is passed to the nginx buildpack script as our "server", which
will block on STDIN so that nginx will run until receiving
SIGTERM.

r? @pietroalbini

In production we aren't gracefully shutting down as expected. nginx
does a quick shutdown when receiving a `SIGTERM` and once it exits, it
appears that our process is aborted.

Instead of spawning the backend from the shell script, the backend
process is now responsible for spawning the shell script. This allows
the backend to finish its shutdown work even if the script has already
exited.

`cat` is passed to the nginx buildpack script as our "server", which
will block on `STDIN` so that `nginx` will run until receiving
`SIGTERM`.
@jtgeibel
Copy link
Member Author

Deployed and tested on staging:

Mar 16 20:17:59 staging-crates-io app/web.1 downloads_counter shard=21 counted_versions=0 counted_downloads=0 pending_downloads=1
Mar 16 20:18:00 staging-crates-io heroku/web.1 Restarting
Mar 16 20:18:00 staging-crates-io heroku/web.1 State changed from up to starting
Mar 16 20:18:01 staging-crates-io app/web.1 downloads_counter shard=22 counted_versions=0 counted_downloads=0 pending_downloads=1
Mar 16 20:18:01 staging-crates-io heroku/web.1 Stopping all processes with SIGTERM
Mar 16 20:18:01 staging-crates-io app/web.1 Starting graceful shutdown
Mar 16 20:18:01 staging-crates-io app/web.1 Persisting remaining downloads counters
Mar 16 20:18:01 staging-crates-io app/web.1 downloads_counter all_shards counted_versions=1 counted_downloads=1 pending_downloads=0
Mar 16 20:18:01 staging-crates-io app/web.1 Server has gracefully shutdown!
Mar 16 20:18:01 staging-crates-io heroku/web.1 Process exited with status 0

@pietroalbini pietroalbini linked an issue Mar 17, 2021 that may be closed by this pull request
@pietroalbini
Copy link
Member

Thanks for the PR! Your approach works great, but I'm not sure spawning start-web.sh from the application is the best approach from a code organization structure. I'd prefer if Heroku still calls start-web.sh and the script also launches the server in the background, like this:

#!/bin/bash
set -ue

if [[ -z "${USE_FASTBOOT-}" ]]; then
    unset USE_FASTBOOT
else
    export USE_FASTBOOT
    node --optimize_for_size --max_old_space_size=200 fastboot.js &
fi

# Since this script is launched separately, we tell the nginx
# buildpack (`bin/start-nginx`) that `cat` is our server.
bin/start-nginx cat &
target/release/server &
wait -n

@jtgeibel
Copy link
Member Author

I haven't tried that exact script, but I don't think that will work. I tried a dozen different ways to launch it from the script, and couldn't get any of them to work. Launching server in the foreground, or bringing it the foreground with fg. Trapping SIGTERM in the script. Using bash -c "./script/start-web.sh & ./target/release/server" in the Procfile.

I just couldn't convince bash to work. In particular, wait has an unexpected interaction with traps (and wait -n stops waiting as soon as the first child exits).

When bash is waiting for an asynchronous command via the wait builtin, the reception of a signal for which a trap has been set will cause the wait builtin to return immediately with an exit status greater than 128, immediately after which the trap is executed.

I even had a few scripts that seemed to work fine locally, but just didn't work on Heroku. My guess is that maybe I was running into differences between interactive and non-interactive shells. (For example, I had to use set -m to make sure job control was enabled on Heroku.) My other guess, is that maybe Heroku kills all processes on shutdown as soon as the parent process exits.

As strange as it is to launch nginx from our backened app, it was the only approach I could find that ensured our process lives long enough to clean up.

@pietroalbini
Copy link
Member

Ok then, let's get this fixed!

@bors r+

@bors
Copy link
Contributor

bors commented Mar 18, 2021

📌 Commit 0061c5f has been approved by pietroalbini

@bors
Copy link
Contributor

bors commented Mar 18, 2021

⌛ Testing commit 0061c5f with merge e5171ce...

@bors
Copy link
Contributor

bors commented Mar 18, 2021

☀️ Test successful - checks-actions
Approved by: pietroalbini
Pushing e5171ce to master...

@bors bors merged commit e5171ce into rust-lang:master Mar 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Handle graceful shutdown on Heroku
4 participants