Skip to content

Conversation

justahero
Copy link
Contributor

This PR adds support for the new WEB_PAGE_OFFSET_CIDR_BLOCKLIST environment variable that holds a list of IPv4 based CIDR blocks. The use case of this env var is to block IPs coming from the same network or IP range. It should help to mitigate scenarios where a high number of distributed requests from multiple clients put a high load onto the application (e.g. bot network) which happen to come from the same IP range. The new block list is meant to complement the existing User Agent & IP block lists.

Note the functionality is similar to the WEB_PAGE_OFFSET_UA_BLOCKLIST env var which blocks requests by matching user agents. It also only applies to the same (expensive) API endpoints that fetch data using pagination. The common experience using the web site should not be affected. The list is specified as a comma separated list of IPv4 based CIDR blocks (e.g. "192.16.0.0/16").

Currently the CIDR block list only supports IPv4 based entries, each block requires at least a host prefix of 16 bits.

@justahero justahero force-pushed the feature/use-cidr-block-list branch 4 times, most recently from 524d71a to cc88169 Compare May 11, 2022 13:37
Copy link
Member

@Turbo87 Turbo87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a technical reason why you only implemented it for IPv4?

@Turbo87 Turbo87 added C-internal 🔧 Category: Nonessential work that would make the codebase more consistent or clear A-backend ⚙️ labels May 11, 2022
justahero added 2 commits May 12, 2022 11:18
This is to reject requests from IP ranges that overload the API
endpoints by fetching information from multiple clients at the same
time.
A recent incident highlighted this issue where a number of `crates?page`
requests hit the ">0.5 failed requests threshold" alert. These requests
timed out due to a high number of expensive requests made. These calls
came in from about 100 different IPs in parallel.

Before this change individual IPs or specific User Agents could be
blocked. These existing approaches have some drawbacks, blocking single
IPs would not have done much here as this is quite ineffective for this
behaviour, while blocking a specific (common) User Agent may prevent
valid usage of the API.

Therefore a list of CIDR blocks should be used to check if IPs belong to
certain ranges that send too many requests to expensive API endpoints.

**Please note** this has to be done very carefully, it's generally a
tool that may block whole regions of the internet.
@justahero justahero force-pushed the feature/use-cidr-block-list branch from 8bee7b9 to dfe0c87 Compare May 12, 2022 09:19
@Turbo87
Copy link
Member

Turbo87 commented May 17, 2022

@justahero just to make sure we're not waiting on each other: did you see the question above about IPv4-only? :)

@justahero
Copy link
Contributor Author

Interesting, indeed I did not see your comment.

The reason for IPv4 based CIDR blocks was to get the block mechanism started while having a safeguard check on the given host prefix value. We could also remove this safeguard, easily add support for both IPv4 & IPv6 CIDR, then we need to be certain that the env var contains reasonable entries. I think that is fair, as this affects only the pagination endpoints.

@Turbo87
Copy link
Member

Turbo87 commented May 17, 2022

yeah, I think it would probably even simplify the code if we used generic CIDRs. unless there are any major downsides that I'm missing I think it'd be best to slightly tweak it to allow both :)

@justahero justahero force-pushed the feature/use-cidr-block-list branch 2 times, most recently from 46d97ea to cbe1c4b Compare May 18, 2022 10:06
This refactors the `WEB_PAGE_OFFSET_CIDR_BLOCKLIST` block list to also
support IPv6 based CIDR blocks.
@justahero justahero force-pushed the feature/use-cidr-block-list branch from cbe1c4b to e0703b0 Compare May 18, 2022 10:20
Copy link
Member

@Turbo87 Turbo87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you! :)

@Turbo87 Turbo87 merged commit 3e32561 into rust-lang:master May 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-backend ⚙️ C-internal 🔧 Category: Nonessential work that would make the codebase more consistent or clear
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants