You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
set TCP_USER_TIMEOUT to 15 seconds for database connections
In the December 22, 2021 crates.io outage, we experienced full packet
loss between the database server and the application server(s). While
the crates.io application supports running without a database during
outages, those mitigations kicked in only after a bit more than 15
minutes of the packet loss starting.
The default parameters of the Linux network stack result in a TCP
connection being marked as broken after roughly 15 minutes of no
acknowledgements being received, which is what I believe happened during
that outage.
Broken database mitigations kicking in after 15 minutes is too long for
crates.io, as we ideally want those mitigations to kick in as soon as
possible (ensuring crates.io users can continue downloading crates).
This commit tells libpq (the underlying PostgreSQL client used by
diesel) to set the Linux network stack timeout to a configurable value,
which we set as 15 seconds. With this change, if an outage like the one
on Dec 22 2021 one happens again, crates.io will be fully unavailable
only for 15 seconds rather than 15 minutes.
0 commit comments