-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Description
In #23884 (and #3890) we added the fixed_auto_queue_size threadpool which could automatically raise or lower the queue size of the search threadpool depending on the arrival rate of operations and target response rate.
We'd like to take the next step for this and implement adaptive replica selection. This is a partial application of the C3 algorithm used on the coordinating node to select the appropriate replica instead of our current round robin behavior. Note that we cannot currently implement the rate control and backpressure from the paper since we cannot treat each request as having identical cost, though with the automatic queue-sizing already implemented we do have a good way to provide backpressure on the execution nodes themselves already.
The formula for replica ranking (Ψ(s)) (see page 6 of the linked paper) (EWMA = Exponentially Weighted Moving Average):
Ψ(s) = R(s) - 1/µ̄(s) + (q̂(s))^b / µ̄(s)
Where q̂(s) is:
q̂(s) = 1 + (os(s) * n) + q(s)
Here (os(s) * n) is the "concurrency compensation", where os(s) is the number of outstanding requests to a node and n is the number of clients in the system. R(s), q(s), and µ̄(s) are EWMAs of the response time (as seen from the coordinating node), queue-size, and service time received from the execution node.
This will require a number of steps in order to be implemented:
- Track EWMA of task execution time (service time) requests on the execution node (Track EWMA[1] of task execution time in search threadpool executor #24989)
- Piggyback service time EWMA and current queue size from execution node back to coordinating node with the search response (Register data node stats from info carried back in search responses #25430)
- Track EWMA of response time of an execution node on the coordinating node (Register data node stats from info carried back in search responses #25430)
- Track EWMA of queue size on the coordinating node (Register data node stats from info carried back in search responses #25430)
- Implement the actual adaptive replica ranking on the coordinating node when deciding which copy of the data to execute the read operation on (Implement adaptive replica selection #26128)
There is a little flexibility here, since we could possibly use some of our existing metrics (like "took" time) instead of adding new measurements, this is only a rough overview.
Additionally, we will need to decide on a value for b to correctly penalize long queues (the paper uses 3) as well as a good α value for the EWMA calculations. We could also make these configurable if desired.