Skip to content

Conversation

@rmdmattingly
Copy link
Contributor

We've been using this at HubSpot quite successfully, and would like to contribute it.

Throttling has been great for us at HubSpot — we've written about the success here. But we've had a couple of problems at scale:

  1. Users often fail to consistently utilize a meaningful proportion of their throttle. Due to inefficiencies in backoff/retry, and spikes of trying and backing off in tandem around refill intervals, users are often confused by both a steady state utilization of 20-70% of their configured allowance, and a steady state of throttling exceptions.
  2. Also, counter-intuitively but depending on the shape of the workload, we have the opposite problem. Access patterns with a ton of machines and a ton of threads all competing for the same contentious throttle allowance may inadvertently DDOS the RegionServer's RPC layer with thousands upon thousands of doomed, redundant retries. This is due to the naive way in which we calculate the RpcThrottlingException's wait interval — we simply assume that the next sufficient refill will be an opportune time to retry, which only really makes sense in a single-threaded environment

To fix this, we've implemented a FeedbackAdaptiveRateLimiter, inspired by Philipp Janert's Feedback Control for Computer Systems

The FeedbackAdaptiveRateLimiter works much like the FixedIntervalRateLimiter, but with some additional logic to support dynamic wait interval multiplication and modest oversubscription so as to drive more consistent, more full quota utilization while only serving a fraction of the previous RpcThrottlingException volume.

The additional FARL logic can be described with the following categories:

1. Closed-Loop Feedback Control

The limiter implements a classic closed-loop control system where:

  • Setpoint: The configured resource limit (e.g., 10 requests/second)
  • Process Variable: Actual resource consumption over time
  • Control Actions: Adjusting backoff multipliers and oversubscription proportions
  • Feedback: Monitoring contention and utilization metrics to adapt behavior

2. Proportional Control with Integral Behavior

The implementation uses two separate control mechanisms:

Backoff Multiplier Control (hbase-server/src/main/java/org/apache/hadoop/hbase/quotas/FeedbackAdaptiveRateLimiter.java:221-227):

if (hadContentionThisInterval) {
  currentBackoffMultiplier.set(Math.min(
    currentBackoffMultiplier.get() + backoffMultiplierIncrement,
    maxBackoffMultiplier));
} else {
  currentBackoffMultiplier.set(Math.max(
    currentBackoffMultiplier.get() - backoffMultiplierDecrement, 1.0));
}

This is essentially an integral controller that accumulates error over time—increasing pressure
when contention is detected, decreasing it when there's none.

Oversubscription Control (hbase-server/src/main/java/org/apache/hadoop/hbase/quotas/FeedbackAdaptiveRateLimiter.java:229-236):

if (avgUtil < minTargetUtilization) {
  oversubscriptionProportion.set(Math.min(
    oversubscriptionProportion.get() + oversubscriptionIncrement,
    maxOversubscription));
} else if (avgUtil >= maxTargetUtilization) {
  oversubscriptionProportion.set(Math.max(
    oversubscriptionProportion.get() - oversubscriptionDecrement, 0.0));
}

3. Exponential Moving Average (EMA) for Smoothing

The system uses EMA to track utilization (hbase-server/src/main/java/org/apache/hadoop/hbase/quotas/FeedbackAdaptiveRateLimiter.java:216-218):
double util = (double) consumed / intendedUsage;
utilizationEma = emaAlpha * util + (1.0 - emaAlpha) * utilizationEma;

This is a standard signal processing technique from control theory to filter out noise and
respond smoothly to changes in system behavior.

4. Saturation Limits (Anti-Windup)

The implementation includes caps on both control parameters:

  • maxBackoffMultiplier (default: 10.0)
  • maxOversubscription (default: 0.25)

This prevents "integral windup"—a common problem in control systems where the controller
overshoots dramatically.

5. Error Budget / Deadband

The utilizationErrorBudget parameter (hbase-server/src/main/java/org/apache/hadoop/hbase/quotas/FeedbackAdaptiveRateLimiter.java:95-97) creates a deadband around the target utilization (1.0):

  • Target range: [0.975, 1.025] (with default 0.025 error budget)
  • No control action taken when utilization is within this range
  • This prevents oscillation around the setpoint—a key stability consideration in control theory

6. Dual-Control Strategy

The system elegantly addresses two different control objectives:

  1. Fast Response (Backoff Multiplier): Responds immediately to contention on a per-interval basis
  2. Long-term Optimization (Oversubscription): Uses the EMA to gradually tune the system for
    optimal steady-state utilization

This mirrors the proportional + integral (PI) control pattern where you need both fast response
to disturbances and elimination of steady-state error.

Why This Design?

The traditional fixed-interval rate limiter suffers from two problems this feedback control
approach solves:

  1. Thundering herd: Multiple threads hitting the limiter simultaneously cause spikes
  2. Under-utilization: Conservative limits lead to wasted capacity

By applying control theory:

  • The backoff multiplier increases backpressure to spread out concurrent requests (damping)
  • The oversubscription allows slight over-limit to achieve full utilization without violating
    limits on average
  • The EMA provides stability by not reacting to transient spikes

In practice, we have deployed this with these default settings across all of our hundreds of clusters at HubSpot with great success, powering everything from live user-facing requests to async batch jobs. See our reduced RpcThrottlingException volume below (this is the average of our top n RegionServers by RTE/sec):
Screenshot 2025-10-17 at 10 06 44 AM

And with the variety of configuration levers available in the FARL, you can truly adapt the rate limiter to appease whatever your priorities might be (strict backoffs, lenient oversubscription). In combination with HBASE-29663, which made rate limiter configurations dynamically refreshable, this is a powerful combination for improving the usability & scalability of HBase's Quotas system.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 29s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
_ master Compile Tests _
+1 💚 mvninstall 4m 8s master passed
+1 💚 compile 3m 27s master passed
+1 💚 checkstyle 1m 5s master passed
+1 💚 spotbugs 2m 17s master passed
+1 💚 spotless 1m 12s branch has no errors when running spotless:check.
_ Patch Compile Tests _
+1 💚 mvninstall 3m 40s the patch passed
+1 💚 compile 3m 49s the patch passed
+1 💚 javac 3m 49s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 1m 7s the patch passed
+1 💚 spotbugs 2m 2s the patch passed
+1 💚 hadoopcheck 13m 16s Patch does not cause any errors with Hadoop 3.3.6 3.4.1.
+1 💚 spotless 0m 59s patch has no errors when running spotless:check.
_ Other Tests _
+1 💚 asflicense 0m 16s The patch does not generate ASF License warnings.
46m 5s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7396/2/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #7396
Optional Tests dupname asflicense javac spotbugs checkstyle codespell detsecrets compile hadoopcheck hbaseanti spotless
uname Linux 8b54e78ec17a 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 1d23753
Default Java Eclipse Adoptium-17.0.11+9
Max. process+thread count 85 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7396/2/console
versions git=2.34.1 maven=3.9.8 spotbugs=4.7.3
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 37s Docker mode activated.
-0 ⚠️ yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --author-ignore-list --blanks-eol-ignore-file --blanks-tabs-ignore-file --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+1 💚 mvninstall 3m 58s master passed
+1 💚 compile 1m 15s master passed
+1 💚 javadoc 0m 37s master passed
+1 💚 shadedjars 7m 24s branch has no errors when building our shaded downstream artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 3m 42s the patch passed
+1 💚 compile 1m 9s the patch passed
+1 💚 javac 1m 9s the patch passed
+1 💚 javadoc 0m 32s the patch passed
+1 💚 shadedjars 6m 36s patch has no errors when building our shaded downstream artifacts.
_ Other Tests _
+1 💚 unit 246m 43s hbase-server in the patch passed.
277m 59s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7396/2/artifact/yetus-jdk17-hadoop3-check/output/Dockerfile
GITHUB PR #7396
Optional Tests javac javadoc unit compile shadedjars
uname Linux a6711b505bd1 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 1d23753
Default Java Eclipse Adoptium-17.0.11+9
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7396/2/testReport/
Max. process+thread count 3820 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7396/2/console
versions git=2.34.1 maven=3.9.8
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

@rmdmattingly rmdmattingly marked this pull request as ready for review October 18, 2025 12:31
Comment on lines +223 to +286
currentBackoffMultiplier.set(Math
.min(currentBackoffMultiplier.get() + backoffMultiplierIncrement, maxBackoffMultiplier));
} else {
currentBackoffMultiplier
.set(Math.max(currentBackoffMultiplier.get() - backoffMultiplierDecrement, 1.0));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Be aware that the .get() and .set() on the AtomicDouble are not together an atomic operation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Good observation. I believe this is okay because getWaitIntervalMs is synchronized and is the only public mechanism for triggering a refill — and refill is protected — so races don't actually occur in practice

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 11s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
_ master Compile Tests _
+1 💚 mvninstall 4m 42s master passed
+1 💚 compile 3m 59s master passed
+1 💚 checkstyle 1m 8s master passed
+1 💚 spotbugs 1m 58s master passed
+1 💚 spotless 0m 59s branch has no errors when running spotless:check.
_ Patch Compile Tests _
+1 💚 mvninstall 4m 21s the patch passed
+1 💚 compile 3m 57s the patch passed
+1 💚 javac 3m 57s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 1m 7s the patch passed
+1 💚 spotbugs 2m 4s the patch passed
+1 💚 hadoopcheck 13m 16s Patch does not cause any errors with Hadoop 3.3.6 3.4.1.
+1 💚 spotless 0m 51s patch has no errors when running spotless:check.
_ Other Tests _
+1 💚 asflicense 0m 11s The patch does not generate ASF License warnings.
46m 48s
Subsystem Report/Notes
Docker ClientAPI=1.48 ServerAPI=1.48 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7396/3/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #7396
Optional Tests dupname asflicense javac spotbugs checkstyle codespell detsecrets compile hadoopcheck hbaseanti spotless
uname Linux 4cd4532523b2 6.8.0-1024-aws #26~22.04.1-Ubuntu SMP Wed Feb 19 06:54:57 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 2b5fa7e
Default Java Eclipse Adoptium-17.0.11+9
Max. process+thread count 71 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7396/3/console
versions git=2.34.1 maven=3.9.8 spotbugs=4.7.3
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 13s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+0 🆗 detsecrets 0m 1s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
_ master Compile Tests _
+1 💚 mvninstall 6m 1s master passed
+1 💚 compile 5m 12s master passed
+1 💚 checkstyle 1m 36s master passed
+1 💚 spotbugs 2m 40s master passed
+1 💚 spotless 1m 21s branch has no errors when running spotless:check.
_ Patch Compile Tests _
+1 💚 mvninstall 5m 27s the patch passed
+1 💚 compile 5m 4s the patch passed
+1 💚 javac 5m 4s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 1m 28s the patch passed
+1 💚 spotbugs 2m 37s the patch passed
+1 💚 hadoopcheck 17m 35s Patch does not cause any errors with Hadoop 3.3.6 3.4.1.
+1 💚 spotless 1m 16s patch has no errors when running spotless:check.
_ Other Tests _
+1 💚 asflicense 0m 17s The patch does not generate ASF License warnings.
61m 40s
Subsystem Report/Notes
Docker ClientAPI=1.48 ServerAPI=1.48 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7396/4/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #7396
Optional Tests dupname asflicense javac spotbugs checkstyle codespell detsecrets compile hadoopcheck hbaseanti spotless
uname Linux fdc97f2cb777 6.8.0-1024-aws #26~22.04.1-Ubuntu SMP Wed Feb 19 06:54:57 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 2b5fa7e
Default Java Eclipse Adoptium-17.0.11+9
Max. process+thread count 71 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7396/4/console
versions git=2.34.1 maven=3.9.8 spotbugs=4.7.3
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

@rmdmattingly
Copy link
Contributor Author

org.apache.hadoop.hbase.util.TestProcDispatcher.testRetryLimitOnConnClosedErrors test failure looks unrelated

@rmdmattingly rmdmattingly merged commit a79100b into apache:master Oct 22, 2025
1 check failed
@rmdmattingly rmdmattingly deleted the HBASE-29351 branch October 22, 2025 17:48
rmdmattingly added a commit that referenced this pull request Oct 22, 2025
Co-authored-by: Ray Mattingly <[email protected]>
Signed-off-by: Charles Connell <[email protected]>
rmdmattingly added a commit that referenced this pull request Oct 22, 2025
Co-authored-by: Ray Mattingly <[email protected]>
Signed-off-by: Charles Connell <[email protected]>
rmdmattingly added a commit that referenced this pull request Oct 24, 2025
Signed-off-by: Charles Connell <[email protected]>
Co-authored-by: Ray Mattingly <[email protected]>
rmdmattingly added a commit that referenced this pull request Oct 24, 2025
Signed-off-by: Charles Connell <[email protected]>
Co-authored-by: Ray Mattingly <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants