-
Couldn't load subscription status.
- Fork 6
thread-local cache of left-right readhandles #903
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
25a5fb8 to
fd16542
Compare
5de9cd8 to
7fa934e
Compare
5f54e67 to
c772a0d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces a new crate left-right-tlcache that provides thread-local caching of left-right read handles. The solution addresses the challenge of safely sharing read handles across multiple threads when the set of handles may not be known upfront and can change over time.
Key changes:
- Implements a thread-local cache system for
ReadHandle<T>instances - Provides traits for read handle provisioning and identity management
- Includes comprehensive test coverage with various scenarios
Reviewed Changes
Copilot reviewed 3 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| left-right-tlcache/src/lib.rs | Core implementation of thread-local read handle cache with traits, cache logic, and extensive tests |
| left-right-tlcache/Cargo.toml | Package configuration for the new crate with required dependencies |
| Cargo.toml | Workspace configuration updates to include the new crate |
c772a0d to
79683ab
Compare
left-right-tlcache/src/lib.rs
Outdated
| // store a new entry locally with a handle, its identity and version, for the given key | ||
| let rhandle = Rc::new(fresh); | ||
| let entry = ReadHandleEntry::new(identity, Rc::clone(&rhandle), version); | ||
| let rhandle = Rc::new(factory.handle()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not understand the race that is being protected against here, nor do I see how this achieves any race protection. Perhaps a conversation about this is in order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. I've updated the commit message to make it more clear.
It is very unlikely that this would happen, but I believe possible:
- Suppose you have a thread X with one rhandle and the current version is N
- There is provider holding a collection that may change without X's control
- Thread X queries the provider to validate its rhandle, by requesting the version. Provider returns N + k.
(this is to avoid needing to request the identity which would require some more expensive lookup). - Since N + k != N, then thread X needs to request the identity of the rhandle
... but between (3) and (4) the contents of the collection handled by the provider may have changed and the version be N+k', k'>k. As a result, thread X would get an rhandle for version N+k' but believe it was for version N+k. Nothing terrific would happen because, on a subsequent check, it would see that N+k != N+k' and re-request the identity again. But the race exists 🤷 ?
Since the fix is trivial, I added it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In our case (with the provider we use) it may well be that this never happens.
| } | ||
|
|
||
| // create local cache | ||
| make_thread_local_readhandle_cache!(TEST_CACHE, u64, TestStruct); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why make this global, why not per test so that you don't have to serialize things?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tests are very short. I can change that if needed, but we already use the crate to serialize tests.
|
|
||
| #[serial] | ||
| #[test] | ||
| fn test_readhandle_cache() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where are the actual multithreaded shuttle tests? How do you know this works with actual concurrency?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was planning to add them for the fibtable (still have to) in our actual usage of this.
The only evidence available so far is that all the existing tests pass and the issues seen before don't show up.
Will add tests regardless.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed, I will add those in #918 because this PR is just for the cache.
79683ab to
eda6dab
Compare
c0accdb to
807bbf3
Compare
Add a test provider and several tests. Tests are serialized because they run in the same thread and access the same cache. That could be avoided using distinct caches,though. Signed-off-by: Fredi Raspall <[email protected]>
A FibReaderFactory object allows creating FibReaders. Signed-off-by: Fredi Raspall <[email protected]>
This commit contains 3 changes:
1) Rename FibId -> FibKey and clarify usage. The old FibId is
used in two ways: one, to identify a Fib; another, to look up
fibs in the Fibtable. Rename it to FibKey since, while still
used as an Id, the main purpose is to act as a key.
Also, change some internal APIs to make sure that Fibs are
identified by a single FibKey (based on the vrf id).
2) Restructure FibTable: FibReaders (which wrap ReadHandles<Fib>)
cannot be shared among threads. Instead of storing a FibReader
for each Fib, store a FibReaderFactory from which readers
can be obtained on a per-thread basis. In this patch, every
time a Fib needs to be accessed, a new FibReader is created,
which is very inefficient due to the need for locking
when creating ReadHandles. Subsequent commits will fix this.
We also keep the Id of a Fib along with its factory. This is
needed to be able to efficiently implement the traits needed
for the readhandle cache. On this, note that:
- the fibtable could keep fibreaders as it did, since new
can be obtained via cloning. However, keeping instead
factories protects us from errorneously handing off a reference
to a FibReader to more than a thread.
- if we would store fibreaders (and clone them on demand),
we would not need to store the Id (since it could be obtained
entering the fib via the readhandle), but the number of
readers would be increased.
3) Elements in the new FibTable are stored within an Arc. This is
to ensure that the compiler automatically determines that
FibTable is Sync.
4) Avoid needing to derive/implement Clone() for FibTable by adding
a dummy sync_from() Absorb implementation, by publishing on
FibTable creation.
Signed-off-by: Fredi Raspall <[email protected]>
Signed-off-by: Fredi Raspall <[email protected]>
The thread-local cache of prior commit stores ReadHandle<Fib>. However, the rest of the code uses transparent wrapper FibReader, with additional methods. Implement AsRef so that we can have FibReaders with zero cost. Signed-off-by: Fredi Raspall <[email protected]>
Add left-right-tlcache as dependency to implement the fibtable cache. Implement trait Identity for Fib so that the thread-local cache of ReadHandle<Fib>s can determine if a read handle points to the right object or the entry needs to be invalidated. Signed-off-by: Fredi Raspall <[email protected]>
Versioning the fibtable will allow us to determine the validity of cached readhandles without needing to ask the provider, which would add an extra lookup (e.g. hash or similar), on every cache lookup. This patch adds the version for the FibTable and the logic to monotonically increment its value anytime it is updated. Note that version is u64 instead of AtomicU64 since we don't need to synchronize multiple threads on its value. Signed-off-by: Fredi Raspall <[email protected]>
Implement AsRef<ReadHandleFactory<Fib>> for FibReaderFactory so that it can be treated as a ReadHandleFactory<Fib>. Signed-off-by: Fredi Raspall <[email protected]>
Declare thread-local ReadHandleCache `FIBTABLE_CACHE`. Implement trait ReadHandleProvider for FibTable so that it can be polled by the cache, per thread. We could implement ReadHandleProvider for FibTableReader instead. However, implementing it for the FibTable is more general in that it would also work if the FibTable itself was not wrapped in left- right. Signed-off-by: Fredi Raspall <[email protected]>
... from Rc<ReadHandle<Fib>>. The cache stores ReadHandle<T>'s for some T. In the case of the FibTable, T=Fib and we want to hand off FibReaders (a thin wrapper of ReadHandle<Fib>) to the threads using the cache. Add a method to perform such a conversion. Signed-off-by: Fredi Raspall <[email protected]>
Add the main method to lookup FibReaders from a FibKey from the thread-local cache and a FibTableReader. The method will check the local cache to return FibReaders owned by the calling thread. On failure, it will pull data from provider (fibtable) accessible from the same FibTableReader. Signed-off-by: Fredi Raspall <[email protected]>
Let the ipforward NF look up FibReader from thread-local cache. Signed-off-by: Fredi Raspall <[email protected]>
Signed-off-by: Fredi Raspall <[email protected]>
Extend the thread-local cache with full refresh logic and iterator. This requires adding a new method to trait ReadHandleProvider. Signed-off-by: Fredi Raspall <[email protected]>
impl ReadHandleProvider get_iter() for FibTable. This is needed to allow the thread handling the cli to safely iterate over the fibs. Signed-off-by: Fredi Raspall <[email protected]>
The existing iterator implementation is subobtimal in that each call to get_iter() needs to request the provider a full iterator for the whole set of read handle factories, unless refresh arg is set to false, and then refresh the local cache with read handles built from each factory. This is inefficient in case the set of objects in the provider rarely changes (which may be the case in most setups). Avoid pulling an iterator from the provider if the version did not change, as this means that no objects have been added or removed. If anything, we remove unusable readers. Signed-off-by: Fredi Raspall <[email protected]>
Fix the code so that a cache keeps the minimal number of rhandles after a refresh. The previous code would keep an independent rhandle for aliases, meaning that if there are N objects each with one allias, 2 x N rhandles would exist after a refresh. Solve this by letting aliases use the same rhandle as their primary. Signed-off-by: Fredi Raspall <[email protected]>
Signed-off-by: Fredi Raspall <[email protected]>
Signed-off-by: Fredi Raspall <[email protected]>
Misc small cleanups: - adjust method visibility - use more reasonable tracing levels - remove unused methods - clean-up tests - move fibgroup drop builder Signed-off-by: Fredi Raspall <[email protected]>
This is a manual revert of 6ca1bfe. RefCell cannot be used with more than one thread, even if threads would only ever borrow immutably the types within an Arc since the state kept within the RefCell to check borrows is not thread-safe and mutates, making it unusable. Signed-off-by: Fredi Raspall <[email protected]>
- clarify documentation. - ensure that installed fibgroups always have at least 1 fibentry. - ensure that fibroutes refer at least to one fibgroup. - Add guarded methods for lpm, that keep a guarded reference to the fib entry to use to forward a packet. These are NOT yet used and actually bring no extra safety, but allow performing lpm ops without explicitly "entering" the left-right protected fib. Signed-off-by: Fredi Raspall <[email protected]>
Two tests are added:
- one to validate new methods that provide guarded fibentries.
- one where multiple workers repeatedly do concurrent lookups
on the same fib and for the same destination, while another
thread adds/removes/modifies the route that best matches that
destination.
Signed-off-by: Fredi Raspall <[email protected]>
While a fib is being deleted, it may be actively looked up to forward packets from dataplane workers. Under heavy load, misaligned memory accesses have been observed when a fib is removed. This patch avoids those by "invalidating" a fib prior to its deletion and requiring readers to check the validity of the fib before attempting any lookup. Signed-off-by: Fredi Raspall <[email protected]>
Also add crate concurrency as dependency for later use. Note that no test is currently using shuttle/loom at the moment. Signed-off-by: Fredi Raspall <[email protected]>
7770cdf to
48d3ad6
Compare
There is a follow-up to this PR adding functionality that is later needed (cache refresh, iterators).
However, this is ready for review.
This is part of #909
Fixes #919