Skip to content

Commit 63f6b94

Browse files
authored
update to spike limit LA content (#5836)
* updated spike limit LA content * updated heading * add link to example * added bursty projects section * fix image link * wording edits
1 parent 7177e8e commit 63f6b94

File tree

2 files changed

+35
-21
lines changed

2 files changed

+35
-21
lines changed
225 KB
Loading

src/docs/product/accounts/quotas/manage-event-stream-guide.mdx

Lines changed: 35 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -111,60 +111,74 @@ To review the error events dropped because of spike protection, go to the "Usage
111111

112112
Events will not be dropped during any minute in which you don't send more than the hourly limit that Sentry has calculated for you. After 24 hours without any dropped events, spike protection becomes "inactive" again. This means that it is no longer dropping events, but _it does not mean the system has stopped paying attention._ The next time events are dropped, spike protection will be "reactivated".
113113

114-
### New Heuristic Changes
114+
### New Spike Protection Calculations
115115

116116
<Include name="limited-avail-note.mdx" />
117117

118-
Spike protection is enabled for every project by default, and when it's enabled, Sentry continually monitors for spikes. You can confirm that it's enabled in **Settings > Projects > _Select Project_ > General Settings**.
118+
Limited availability spike protection is a project-level tool that helps prevent quota overconsumption. It's enabled for every project by default, and when it's enabled, Sentry continually monitors for spikes. You can confirm that it's enabled in **[Project] > Settings > General Settings**.
119119

120-
The way our spike protection algorithm essentially works is by using a weighted average of your events over the past 168 hours (past 7 days), applying a multiplier to that number, comparing this final number against a floor bound that is determined using your quota, and setting that as your spike limit.
120+
Our spike protection algorithm does the following:
121121

122-
#### Spike Protection Inputs
122+
- Uses a weighted average of your events over the past 168 hours (seven days)
123+
- Applies a multiplier to that number
124+
- Compares this final number against a minimum number of events, determined using your quota, to trigger a spike
125+
- Sets this as your spike limit
123126

124-
- Number of projects
125-
- Quota (per event type)
126-
- Events in the past 7 days
127+
#### Setting the Spike Limit
127128

128-
#### Floor Bound Calculation
129+
There are two ways that we can set your spike limit, or the number of events that trigger a spike:
129130

130-
To break it down even further, the first step of this algorithm identifies a floor bound that is calculated using your quota. This bound takes the max of either 500 events or (3 \* your quota)/(720 \* number of projects) - the latter number represents your project using up 3 times your overall quota in 30 days if events are continually ingested at this hourly rate, thus flagging for a potential spike.
131+
- [Minimum Event Calculation](#minimum-event-calculation) - A calculation that determines a minimum number of events
132+
- [Usage-Based Calculation](#usage-based-calculation) - A projection based on your past usage
131133

132-
#### Spike Limit Calculation
134+
The spike limit for each hour is set using either the minimum event or usage-based calculation — whichever is higher. This is done for a number of reasons. Firstly, using a minimum event calculation protects smaller or new projects. New projects that don't have a week’s worth of data to use to calibrate spike limits can use this minimum number of events, an adaptation of the organization’s quota, to approximate appropriate limits. Additionally, this calculation can be used to minimize false positives in smaller or new projects so that spikes aren’t flagged incorrectly.
133135

134-
The next step uses hourly data from the past 7 days to calculate spike limit projections for the next 7 days. This data is used to calculate weighted averages, which takes into account weekly and hourly seasonality. For example, the weighted average calculated for Monday at 3 pm is more heavily influenced by data points on Monday or hours around 3 pm. This weighted average is then multiplied by a multiplier that is 5 times the overall standard deviation of the past week - this multiplier is bounded between 3 and 6.
136+
Spike limits are recalculated in real time throughout the duration of the spike to adjust for the increasing volume of incoming events. This allows the limit to grow at a steady rate such that quota is protected from being quickly consumed. [An example](#example) of how this works during a spike is shown below.
135137

136-
#### Setting the Final Limit
138+
##### Minimum Event Calculation
137139

138-
The final spike limit for each hour is set to the max of the floor bound or the calculated limit. This is done for a multitude of reasons - firstly, using the floor bound protects smaller or new projects. New projects that do not have a week’s worth of data to use to calibrate spike limits can use the floor, an adaptation of the organization’s quota, to approximate appropriate limits. Additionally, the floor can be used to minimize false positives in smaller/new projects such that spikes aren’t flagged incorrectly.
140+
This calculation, which is the first step of our algorithm, identifies a minimum number of events, using your quota as a guide. This number takes the maximum of either 500 events or the result of the following formula `(3 \* your quota)/(720 \* number of projects)`. The equation represents your project using up three times your overall quota in 30 days if events are continually ingested at this hourly rate, thus flagging the project for a potential spike.
139141

140-
Additionally, at the onset of a spike, spike limits are recalculated in real time throughout the duration of the spike. While this is done to adjust for the increasing volume of incoming events, the limit grows at a steady rate such that quota is protected and not blown through. An example of how our heuristic works during a spike is shown below.
142+
##### Usage-Based Calculation
141143

142-
#### Example Calculations
144+
This calculation, which is the second step of our algorithm, calculates hourly data from the past seven days to determine spike limit projections for the next seven days. This data is used to calculate weighted averages, which takes into account weekly and hourly seasonality. For example, the weighted average calculated for Monday at 3 pm is more heavily influenced by data points on Monday or the hours around 3 pm. This weighted average is then multiplied by a multiplier that is `5` times the overall standard deviation of the past week — this multiplier is bounded between `3` and `6`.
143145

146+
#### Example
147+
148+
In this example, the project usually ingests 100-200 events per hour. There's been a spike that’s reached 50,000 events, as shown in the graph below:
144149
![Spike zoomed out](spike-protection-zoomed-out.png)
145150

151+
In the following graph, we can see a zoomed in perspective of the 12-hour period of the spike, along with a line indicating the spike limit as it’s being recalculated over the course of the spike:
146152
![Spike zoomed in plotted with spike limits](spike-protection-zoomed-in.png)
147153

148-
**_During Spike_**
154+
Throughout the spike, the recalulating limit has the following effect:
149155

150156
- 1st hour: 6k events ingested, limit is recalculated to 2083, 3917 events dropped
151157
- 2nd hour: 34k events ingested, limit is recalculated to 2873, 31217 events dropped
152158
- 3rd hour: 55k events ingested, limit is recalculated to 5452, ~49k events dropped
153159
- 4th hour: 49k events ingested, limit is recalculated to 7628, ~41k events dropped
154160
- 5th hour: 41k events ingested, limit is recalculated to 9371, ~31k events dropped
155161

156-
Limits are recalculated throughout the duration of the spike.
157-
158162
For this particular example:
159163

160-
- Org Quota: 500k
161-
- Events Ingested: ~478k
162-
- Events ~157k
164+
- Org quota: 500k
165+
- Events ingested during the spike: ~478k
166+
- Events accepted overall: ~157k
163167

164168
Here's an example of spike limit projections for a week, taking into account seasonality:
165169

166170
![Spike limit projections with seasonality](spike-protection-steady-state.png)
167171

172+
These regular differences in event ingestion don't cause a spike to occur.
173+
174+
#### Bursty Projects
175+
176+
There may be instances where a project routinely accepts a high volume of events in a very short period of time by design — for example projects that orchestrate cron/Airflow jobs or task runners. The screenshot below shows an example of this kind of behavior:
177+
178+
![A "bursty" project with intentional spikes.](bursty_projects.png)
179+
180+
If this is expected behavior for a given project in your organization, you may want to consider turning off spike protection in the project settings to ensure necessary events aren't dropped.
181+
168182
## 2. Adjusting Quotas
169183

170184
<Note>

0 commit comments

Comments
 (0)