-
Couldn't load subscription status.
- Fork 4
Improvements to the background workers #96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 14 of 29 files at r1.
Reviewable status: 14 of 29 files reviewed, 8 unresolved discussions (waiting on @byrnedj)
cachelib/allocator/BackgroundMover-inl.h line 82 at r1 (raw file):
auto assignedMemory = mutex.lock_combine([this] { return assignedMemory_; }); while (true) {
Why? Isn't it better to let PeriodicWorker schedule the work?
cachelib/allocator/BackgroundMoverStrategy.h line 61 at r1 (raw file):
}; /*
Can you remove it if it's not needed anymore?
cachelib/allocator/Cache.h line 84 at r1 (raw file):
bool operator<(const MemoryDescriptorType& rhs) { if (this->tid_ < rhs.tid_) return true;
nit: you can implement this like this:
return std::make_tuple(tid_, pid_, cid_) < std::make_tuple(rhs.tid_, rhs.pid_, rhs.cid)
Same is true for operator==.
cachelib/allocator/Cache.h line 108 at r1 (raw file):
}; inline bool operator<(const MemoryDescriptorType& lhs, const MemoryDescriptorType& rhs) {
nit: I don't think you need to implement this operator as a free function if you already have a member operator<
cachelib/allocator/Cache.h line 119 at r1 (raw file):
return false; } inline bool operator==(const MemoryDescriptorType& lhs, MemoryDescriptorType& rhs) {
nit: same here
cachelib/allocator/CacheAllocator.h line 2075 at r1 (raw file):
uint32_t currItems = mmContainer.size(); if (currItems < batch) { batch = currItems/2;
why / 2?
cachelib/allocator/CacheAllocator-inl.h line 444 at r1 (raw file):
return memory; } else { if (memory.size() != 0) {
I would suggest using some wrapper for freeing the memory - if you do this manually, you need to be very careful about potential exceptions.
cachelib/allocator/CacheAllocator-inl.h line 1806 at r1 (raw file):
bool markMoving, bool fromBgThread) { std::vector<Item*> newAllocs;
I'm not sure if this makes a big difference but with so many vectors we will have quite a lot of dynamic memory allocations which can impact performance. You might want to consider putting all of those variables into a tuple/struct and having just a single vector (which you should reserve()).
cachelib/allocator/memory/MemoryPool.cpp line 292 at r1 (raw file):
// allocation class ids that need slab to initiate an allocation. LockHolder l(lock_); auto allocs2 = ac.allocateBatch(remain);
I don't quite get the logic here. If we failed to make batch allocations before why do you expect it to succeed now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 29 files at r1, all commit messages.
Reviewable status: 14 of 29 files reviewed, 11 unresolved discussions (waiting on @byrnedj and @igchor)
cachelib/CMakeLists.txt line 46 at r1 (raw file):
option(BUILD_TESTS "If enabled, compile the tests." ON) option(WITH_DTO "If enabled, build with DSA transparent offloading." OFF)
nit: I would add BUILD_ prefix to WITH_DTO: BUILD_WITH_DTO.
cachelib/allocator/BackgroundMover.h line 68 at r1 (raw file):
BackgroundMoverStats getStats() const noexcept; std::map<MemoryDescriptorType,uint64_t> getClassStats() const noexcept;
nit: I would add an alias for the std::map<MemoryDescriptorType,uint64_t>, e.g. using class_moves_map_type = std::map<MemoryDescriptorType,uint64_t>.
And use this alias everywhere, to avoid bulk renaming (like right now you need to update multiple places) in the future.
cachelib/allocator/MMTinyLFU.h line 330 at r1 (raw file):
// is unchanged. bool add(T& node) noexcept; bool addBatch(std::vector<T*>& nodes) noexcept;
nit: better practice to pass begin and end iterators to make interface container-independent. For example:
template< class InputIt>
bool addBatch(InputIt first, InputIt last) noexcept;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 14 of 29 files reviewed, 12 unresolved discussions (waiting on @byrnedj and @igchor)
cachelib/allocator/MMTinyLFU-inl.h line 218 at r1 (raw file):
template <typename T, MMTinyLFU::Hook<T> T::*HookPtr> bool MMTinyLFU::Container<T, HookPtr>::addBatch(std::vector<T*>& nodes) noexcept { return false;
why it is not implemented?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 14 of 29 files reviewed, 14 unresolved discussions (waiting on @byrnedj and @igchor)
cachelib/allocator/BackgroundMover-inl.h line 39 at r1 (raw file):
std::memory_order_relaxed); maxTraversalTimeNs_.store(std::max(maxTraversalTimeNs_.load(), nsTaken), std::memory_order_relaxed);
The implementation looks like not thread-safe and as I can understand recordTraversalTime is called by a single thread. If so, why we need to use std::atomic<> to store the statistic?
cachelib/allocator/BackgroundMover-inl.h line 102 at r1 (raw file):
auto end = util::getCurrentTimeNs(); if (moves > 0) { traversalStats_.recordTraversalTime(end > begin ? end - begin : 0);
why do we need to check end > begin? is it for overflow check? if so, we need better handling for overflow: at least it is better to skip rather than write 0.
If it is not about overflow, could you please describe when it is possible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 9 of 29 files reviewed, 6 unresolved discussions (waiting on @igchor and @vinser52)
cachelib/CMakeLists.txt line 46 at r1 (raw file):
Previously, vinser52 (Sergei Vinogradov) wrote…
nit: I would add
BUILD_prefix toWITH_DTO:BUILD_WITH_DTO.
done
cachelib/allocator/BackgroundMover-inl.h line 39 at r1 (raw file):
Previously, vinser52 (Sergei Vinogradov) wrote…
The implementation looks like not thread-safe and as I can understand
recordTraversalTimeis called by a single thread. If so, why we need to use std::atomic<> to store the statistic?
You're right - these don't need to be atomics - I have updated them and some other counters since no other thread will access them.
cachelib/allocator/BackgroundMover-inl.h line 82 at r1 (raw file):
Previously, igchor (Igor Chorążewicz) wrote…
Why? Isn't it better to let PeriodicWorker schedule the work?
I find that I get 20% more background evictions with this loop with translates to some improvement in throughput. The worker will run as long as there is work to do - it avoids jumping out and being rescheduled in the case that there are still items to evict.
cachelib/allocator/BackgroundMover-inl.h line 102 at r1 (raw file):
Previously, vinser52 (Sergei Vinogradov) wrote…
why do we need to check end > begin? is it for overflow check? if so, we need better handling for overflow: at least it is better to skip rather than write
0.
If it is not about overflow, could you please describe when it is possible?
It is just for safety - so that the subtraction of end-begin will be greater than 0 and recordTraversalTime will always take a non-negative value. I did just copy this logic from Reaper thread.
cachelib/allocator/MMTinyLFU-inl.h line 218 at r1 (raw file):
Previously, vinser52 (Sergei Vinogradov) wrote…
why it is not implemented?
Done.
cachelib/allocator/memory/MemoryPool.cpp line 292 at r1 (raw file):
Previously, igchor (Igor Chorążewicz) wrote…
I don't quite get the logic here. If we failed to make
batchallocations before why do you expect it to succeed now?
because we hold the lock - this is the logic from the original allocate call
|
@byrnedj could you please split the changes logically into commits (and remove commits with fixes after review, etc.)? I would prefer not to squash all of them on merge, so that it's easier to upstream it. Also, would it be possible to share the implementation of findEvictionBatch with findEviction? Or should we consider that after merging this and rebasing (since there are some change in upstream I think). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 6 files at r3, all commit messages.
Reviewable status: 7 of 30 files reviewed, 7 unresolved discussions (waiting on @byrnedj and @igchor)
cachelib/allocator/BackgroundMover.h line 56 at r3 (raw file):
class BackgroundMover : public PeriodicWorker { public: using ClassBgStatsType = std::map<MemoryDescriptorType,uint64_t>;
do you really need an ordered map?
cachelib/allocator/BackgroundMover.h line 81 at r3 (raw file):
void recordTraversalTime(uint64_t nsTaken); uint64_t getAvgTraversalTimeNs(uint64_t numTraversals) const;
Why do we need to pass numTraversals as a parameter? Why we cannot use the numTraversals_ member?
cachelib/allocator/BackgroundMover-inl.h line 38 at r3 (raw file):
minTraversalTimeNs_ = std::min(minTraversalTimeNs_, nsTaken); maxTraversalTimeNs_ = std::max(maxTraversalTimeNs_, nsTaken); totalTraversalTimeNs_ += nsTaken;
Should we also increment numTraversals_?
cachelib/allocator/BackgroundMover-inl.h line 44 at r3 (raw file):
uint64_t BackgroundMover<CacheT>::TraversalStats::getAvgTraversalTimeNs( uint64_t numTraversals) const { return numTraversals ? totalTraversalTimeNs_ / numTraversals : 0;
if numTraversals equals 0 can we use the numTraversals_ member and return totalTraversalTimeNs_ / numTraversals_?
cachelib/allocator/BackgroundMover-inl.h line 131 at r3 (raw file):
template <typename CacheT> std::map<MemoryDescriptorType,uint64_t>
Use the ClassBgStatsType alias instead of std::map<MemoryDescriptorType,uint64_t>
cachelib/allocator/BackgroundMoverStrategy.h line 51 at r3 (raw file):
batches.push_back(batch); } else { batches.push_back(0);
what does batch size equal 0 mean? Why we should add 0 to the returned vector?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 17 files at r2.
Reviewable status: 9 of 30 files reviewed, 9 unresolved discussions (waiting on @byrnedj and @igchor)
cachelib/allocator/MMLru-inl.h line 225 at r3 (raw file):
return lruMutex_->lock_combine([this, begin, end, currTime]() { uint32_t i = 0; for (auto itr = begin; itr != end; itr++) {
nit: prefix form is more relevant here: ++itr
cachelib/allocator/MMLru-inl.h line 227 at r3 (raw file):
for (auto itr = begin; itr != end; itr++) { T* node = *itr; if (node->isInMMContainer()) {
How the node may be already in the MM container?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 9 of 30 files reviewed, 7 unresolved discussions (waiting on @igchor and @vinser52)
cachelib/allocator/BackgroundMover.h line 56 at r3 (raw file):
Previously, vinser52 (Sergei Vinogradov) wrote…
do you really need an ordered map?
yes because it is nice to iterate through the (cid,pid,tid) in order during stats output. also I am avoiding implementing a hash function that is required in unordered_map
cachelib/allocator/BackgroundMover.h line 81 at r3 (raw file):
Previously, vinser52 (Sergei Vinogradov) wrote…
Why do we need to pass
numTraversalsas a parameter? Why we cannot use thenumTraversals_member?
Done.
cachelib/allocator/BackgroundMover-inl.h line 38 at r3 (raw file):
Previously, vinser52 (Sergei Vinogradov) wrote…
Should we also increment
numTraversals_?
fixed use of numTraversals - we will use the count from BackgroundMover and remove the numTraversals_ (duplicate)
cachelib/allocator/BackgroundMover-inl.h line 44 at r3 (raw file):
Previously, vinser52 (Sergei Vinogradov) wrote…
if
numTraversalsequals0can we use thenumTraversals_member and returntotalTraversalTimeNs_ / numTraversals_?
Done.
cachelib/allocator/BackgroundMover-inl.h line 131 at r3 (raw file):
Previously, vinser52 (Sergei Vinogradov) wrote…
Use the
ClassBgStatsTypealias instead ofstd::map<MemoryDescriptorType,uint64_t>
Done.
cachelib/allocator/BackgroundMoverStrategy.h line 51 at r3 (raw file):
Previously, vinser52 (Sergei Vinogradov) wrote…
what does batch size equal
0mean? Why we should add0to the returned vector?
added comment - it means we are already have the % free space target in this class
cachelib/allocator/MMLru-inl.h line 227 at r3 (raw file):
Previously, vinser52 (Sergei Vinogradov) wrote…
How the node may be already in the MM container?
it shouldn't be, I changed this to a runtime_error.:
36129e4 to
c67df92
Compare
I have done so. Once the rebase is finished I will rebase this branch and we can merge it. |
Sounds good. Regarding sharing the implementation - we can probably do it later, there might be some refactoring needed anyway or some comments from meta. |
adds the following: - approx usage stat - evictions / attempts per class
c67df92 to
57eade8
Compare
The main improvements are:
The results are - nearly 20% improvement in multi-tier (develop branch) performance with this config:
simple_tiers_test_bg.txt
This change is