Improvements to the background workers #96

byrnedj · 2023-09-08T18:47:03Z

The main improvements are:

bulk add/remove from mmContainer
bulk eviction
better scheduling of background threads and calculating batch sizes
improved stats (including those from https://github.com/guptask/CacheLib/tree/dsa_bgknd_bulk_evictions)
ability for background workers to evict/promote without mark moving bit - we can use acquire instead

The results are - nearly 20% improvement in multi-tier (develop branch) performance with this config:
simple_tiers_test_bg.txt

This change is

igchor

Reviewed 14 of 29 files at r1.
Reviewable status: 14 of 29 files reviewed, 8 unresolved discussions (waiting on @byrnedj)

cachelib/allocator/BackgroundMover-inl.h line 82 at r1 (raw file):

  auto assignedMemory = mutex.lock_combine([this] { return assignedMemory_; });

  while (true) {

Why? Isn't it better to let PeriodicWorker schedule the work?

cachelib/allocator/BackgroundMoverStrategy.h line 61 at r1 (raw file):

};

/*

Can you remove it if it's not needed anymore?

cachelib/allocator/Cache.h line 84 at r1 (raw file):

    bool operator<(const MemoryDescriptorType& rhs) {
        if (this->tid_ < rhs.tid_) return true;

nit: you can implement this like this:
return std::make_tuple(tid_, pid_, cid_) < std::make_tuple(rhs.tid_, rhs.pid_, rhs.cid)

Same is true for operator==.

cachelib/allocator/Cache.h line 108 at r1 (raw file):

};

inline bool operator<(const MemoryDescriptorType& lhs, const MemoryDescriptorType& rhs) {

nit: I don't think you need to implement this operator as a free function if you already have a member operator<

cachelib/allocator/Cache.h line 119 at r1 (raw file):

  return false;
}
inline bool operator==(const MemoryDescriptorType& lhs, MemoryDescriptorType& rhs) {

nit: same here

cachelib/allocator/CacheAllocator.h line 2075 at r1 (raw file):

    uint32_t currItems = mmContainer.size();
    if (currItems < batch) {
        batch = currItems/2;

why / 2?

cachelib/allocator/CacheAllocator-inl.h line 444 at r1 (raw file):

      return memory;
    } else {
      if (memory.size() != 0) {

I would suggest using some wrapper for freeing the memory - if you do this manually, you need to be very careful about potential exceptions.

cachelib/allocator/CacheAllocator-inl.h line 1806 at r1 (raw file):

                                             bool markMoving,
                                             bool fromBgThread) {
  std::vector<Item*> newAllocs;

I'm not sure if this makes a big difference but with so many vectors we will have quite a lot of dynamic memory allocations which can impact performance. You might want to consider putting all of those variables into a tuple/struct and having just a single vector (which you should reserve()).

cachelib/allocator/memory/MemoryPool.cpp line 292 at r1 (raw file):

  // allocation class ids that need slab to initiate an allocation.
  LockHolder l(lock_);
  auto allocs2 = ac.allocateBatch(remain);

I don't quite get the logic here. If we failed to make batch allocations before why do you expect it to succeed now?

vinser52

Reviewed 1 of 29 files at r1, all commit messages.
Reviewable status: 14 of 29 files reviewed, 11 unresolved discussions (waiting on @byrnedj and @igchor)

cachelib/CMakeLists.txt line 46 at r1 (raw file):

option(BUILD_TESTS "If enabled, compile the tests." ON)
option(WITH_DTO "If enabled, build with DSA transparent offloading." OFF)

nit: I would add BUILD_ prefix to WITH_DTO: BUILD_WITH_DTO.

cachelib/allocator/BackgroundMover.h line 68 at r1 (raw file):

  BackgroundMoverStats getStats() const noexcept;
  std::map<MemoryDescriptorType,uint64_t> getClassStats() const noexcept;

nit: I would add an alias for the std::map<MemoryDescriptorType,uint64_t>, e.g. using class_moves_map_type = std::map<MemoryDescriptorType,uint64_t>.
And use this alias everywhere, to avoid bulk renaming (like right now you need to update multiple places) in the future.

cachelib/allocator/MMTinyLFU.h line 330 at r1 (raw file):

    //          is unchanged.
    bool add(T& node) noexcept;
    bool addBatch(std::vector<T*>& nodes) noexcept;

nit: better practice to pass begin and end iterators to make interface container-independent. For example:

template< class InputIt>
bool addBatch(InputIt first, InputIt last) noexcept;

vinser52

Reviewable status: 14 of 29 files reviewed, 12 unresolved discussions (waiting on @byrnedj and @igchor)

cachelib/allocator/MMTinyLFU-inl.h line 218 at r1 (raw file):

template <typename T, MMTinyLFU::Hook<T> T::*HookPtr>
bool MMTinyLFU::Container<T, HookPtr>::addBatch(std::vector<T*>& nodes) noexcept {
    return false;

why it is not implemented?

vinser52

Reviewable status: 14 of 29 files reviewed, 14 unresolved discussions (waiting on @byrnedj and @igchor)

cachelib/allocator/BackgroundMover-inl.h line 39 at r1 (raw file):

                            std::memory_order_relaxed);
  maxTraversalTimeNs_.store(std::max(maxTraversalTimeNs_.load(), nsTaken),
                            std::memory_order_relaxed);

The implementation looks like not thread-safe and as I can understand recordTraversalTime is called by a single thread. If so, why we need to use std::atomic<> to store the statistic?

cachelib/allocator/BackgroundMover-inl.h line 102 at r1 (raw file):

    auto end = util::getCurrentTimeNs();
    if (moves > 0) {
      traversalStats_.recordTraversalTime(end > begin ? end - begin : 0);

why do we need to check end > begin? is it for overflow check? if so, we need better handling for overflow: at least it is better to skip rather than write 0.
If it is not about overflow, could you please describe when it is possible?

byrnedj

Reviewable status: 9 of 29 files reviewed, 6 unresolved discussions (waiting on @igchor and @vinser52)

cachelib/CMakeLists.txt line 46 at r1 (raw file):

Previously, vinser52 (Sergei Vinogradov) wrote…

nit: I would add BUILD_ prefix to WITH_DTO: BUILD_WITH_DTO.

done

cachelib/allocator/BackgroundMover-inl.h line 39 at r1 (raw file):

Previously, vinser52 (Sergei Vinogradov) wrote…

The implementation looks like not thread-safe and as I can understand recordTraversalTime is called by a single thread. If so, why we need to use std::atomic<> to store the statistic?

You're right - these don't need to be atomics - I have updated them and some other counters since no other thread will access them.

cachelib/allocator/BackgroundMover-inl.h line 82 at r1 (raw file):

Previously, igchor (Igor Chorążewicz) wrote…

Why? Isn't it better to let PeriodicWorker schedule the work?

I find that I get 20% more background evictions with this loop with translates to some improvement in throughput. The worker will run as long as there is work to do - it avoids jumping out and being rescheduled in the case that there are still items to evict.

cachelib/allocator/BackgroundMover-inl.h line 102 at r1 (raw file):

Previously, vinser52 (Sergei Vinogradov) wrote…

why do we need to check end > begin? is it for overflow check? if so, we need better handling for overflow: at least it is better to skip rather than write 0.
If it is not about overflow, could you please describe when it is possible?

It is just for safety - so that the subtraction of end-begin will be greater than 0 and recordTraversalTime will always take a non-negative value. I did just copy this logic from Reaper thread.

cachelib/allocator/MMTinyLFU-inl.h line 218 at r1 (raw file):

Previously, vinser52 (Sergei Vinogradov) wrote…

why it is not implemented?

Done.

cachelib/allocator/memory/MemoryPool.cpp line 292 at r1 (raw file):

Previously, igchor (Igor Chorążewicz) wrote…

I don't quite get the logic here. If we failed to make batch allocations before why do you expect it to succeed now?

because we hold the lock - this is the logic from the original allocate call

igchor · 2023-11-20T19:20:12Z

@byrnedj could you please split the changes logically into commits (and remove commits with fixes after review, etc.)? I would prefer not to squash all of them on merge, so that it's easier to upstream it.

Also, would it be possible to share the implementation of findEvictionBatch with findEviction? Or should we consider that after merging this and rebasing (since there are some change in upstream I think).

vinser52

Reviewed 1 of 6 files at r3, all commit messages.
Reviewable status: 7 of 30 files reviewed, 7 unresolved discussions (waiting on @byrnedj and @igchor)

cachelib/allocator/BackgroundMover.h line 56 at r3 (raw file):

class BackgroundMover : public PeriodicWorker {
 public:
  using ClassBgStatsType = std::map<MemoryDescriptorType,uint64_t>;

do you really need an ordered map?

cachelib/allocator/BackgroundMover.h line 81 at r3 (raw file):

    void recordTraversalTime(uint64_t nsTaken);

    uint64_t getAvgTraversalTimeNs(uint64_t numTraversals) const;

Why do we need to pass numTraversals as a parameter? Why we cannot use the numTraversals_ member?

cachelib/allocator/BackgroundMover-inl.h line 38 at r3 (raw file):

  minTraversalTimeNs_ = std::min(minTraversalTimeNs_, nsTaken);
  maxTraversalTimeNs_ = std::max(maxTraversalTimeNs_, nsTaken);
  totalTraversalTimeNs_ += nsTaken;

Should we also increment numTraversals_?

cachelib/allocator/BackgroundMover-inl.h line 44 at r3 (raw file):

uint64_t BackgroundMover<CacheT>::TraversalStats::getAvgTraversalTimeNs(
    uint64_t numTraversals) const {
  return numTraversals ? totalTraversalTimeNs_ / numTraversals : 0;

if numTraversals equals 0 can we use the numTraversals_ member and return totalTraversalTimeNs_ / numTraversals_?

cachelib/allocator/BackgroundMover-inl.h line 131 at r3 (raw file):

template <typename CacheT>
std::map<MemoryDescriptorType,uint64_t>

Use the ClassBgStatsType alias instead of std::map<MemoryDescriptorType,uint64_t>

cachelib/allocator/BackgroundMoverStrategy.h line 51 at r3 (raw file):

          batches.push_back(batch);
        } else {
          batches.push_back(0);

what does batch size equal 0 mean? Why we should add 0 to the returned vector?

vinser52

Reviewed 2 of 17 files at r2.
Reviewable status: 9 of 30 files reviewed, 9 unresolved discussions (waiting on @byrnedj and @igchor)

cachelib/allocator/MMLru-inl.h line 225 at r3 (raw file):

  return lruMutex_->lock_combine([this, begin, end, currTime]() {
    uint32_t i = 0;
    for (auto itr = begin; itr != end; itr++) {

nit: prefix form is more relevant here: ++itr

cachelib/allocator/MMLru-inl.h line 227 at r3 (raw file):

    for (auto itr = begin; itr != end; itr++) {
      T* node = *itr;
      if (node->isInMMContainer()) {

How the node may be already in the MM container?

byrnedj

Reviewable status: 9 of 30 files reviewed, 7 unresolved discussions (waiting on @igchor and @vinser52)

cachelib/allocator/BackgroundMover.h line 56 at r3 (raw file):

Previously, vinser52 (Sergei Vinogradov) wrote…

do you really need an ordered map?

yes because it is nice to iterate through the (cid,pid,tid) in order during stats output. also I am avoiding implementing a hash function that is required in unordered_map

cachelib/allocator/BackgroundMover.h line 81 at r3 (raw file):

Previously, vinser52 (Sergei Vinogradov) wrote…

Why do we need to pass numTraversals as a parameter? Why we cannot use the numTraversals_ member?

Done.

cachelib/allocator/BackgroundMover-inl.h line 38 at r3 (raw file):

Previously, vinser52 (Sergei Vinogradov) wrote…

Should we also increment numTraversals_?

fixed use of numTraversals - we will use the count from BackgroundMover and remove the numTraversals_ (duplicate)

cachelib/allocator/BackgroundMover-inl.h line 44 at r3 (raw file):

Previously, vinser52 (Sergei Vinogradov) wrote…

if numTraversals equals 0 can we use the numTraversals_ member and return totalTraversalTimeNs_ / numTraversals_?

Done.

cachelib/allocator/BackgroundMover-inl.h line 131 at r3 (raw file):

Previously, vinser52 (Sergei Vinogradov) wrote…

Use the ClassBgStatsType alias instead of std::map<MemoryDescriptorType,uint64_t>

Done.

cachelib/allocator/BackgroundMoverStrategy.h line 51 at r3 (raw file):

Previously, vinser52 (Sergei Vinogradov) wrote…

what does batch size equal 0 mean? Why we should add 0 to the returned vector?

added comment - it means we are already have the % free space target in this class

cachelib/allocator/MMLru-inl.h line 227 at r3 (raw file):

Previously, vinser52 (Sergei Vinogradov) wrote…

How the node may be already in the MM container?

it shouldn't be, I changed this to a runtime_error.:

byrnedj · 2024-02-28T21:28:29Z

@byrnedj could you please split the changes logically into commits (and remove commits with fixes after review, etc.)? I would prefer not to squash all of them on merge, so that it's easier to upstream it.

Also, would it be possible to share the implementation of findEvictionBatch with findEviction? Or should we consider that after merging this and rebasing (since there are some change in upstream I think).

I have done so. Once the rebase is finished I will rebase this branch and we can merge it.

igchor · 2024-03-04T18:05:08Z

@byrnedj could you please split the changes logically into commits (and remove commits with fixes after review, etc.)? I would prefer not to squash all of them on merge, so that it's easier to upstream it.
Also, would it be possible to share the implementation of findEvictionBatch with findEviction? Or should we consider that after merging this and rebasing (since there are some change in upstream I think).

I have done so. Once the rebase is finished I will rebase this branch and we can merge it.

Sounds good. Regarding sharing the implementation - we can probably do it later, there might be some refactoring needed anyway or some comments from meta.

adds the following: - approx usage stat - evictions / attempts per class

igchor reviewed Sep 14, 2023

View reviewed changes

vinser52 reviewed Sep 18, 2023

View reviewed changes

vinser52 requested changes Sep 18, 2023

View reviewed changes

byrnedj commented Nov 9, 2023

View reviewed changes

igchor requested a review from vinser52 November 9, 2023 19:02

vinser52 requested changes Dec 1, 2023

View reviewed changes

byrnedj commented Feb 28, 2024

View reviewed changes

byrnedj force-pushed the bg-improvements-no-pool-rebal branch from 36129e4 to c67df92 Compare February 28, 2024 21:27

vinser52 force-pushed the develop branch from ff44c3c to 139e2e9 Compare March 1, 2024 20:13

byrnedj added 5 commits March 11, 2024 16:22

enable DTO build without memcpy changes to cachebench

afe8ff9

improved bg stats structure and cachebench output

2307fd9

adds the following: - approx usage stat - evictions / attempts per class

use approximate usage fraction

9062b5b

batch eviction / promotion

e05d54b

no online eviction

57eade8

byrnedj force-pushed the bg-improvements-no-pool-rebal branch from c67df92 to 57eade8 Compare March 25, 2024 19:20

byrnedj merged commit bd69c27 into intel:develop Mar 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Improvements to the background workers #96

Improvements to the background workers #96

Uh oh!

byrnedj commented Sep 8, 2023 •

edited by rdower

Loading

Uh oh!

igchor left a comment

Uh oh!

vinser52 left a comment

Uh oh!

vinser52 left a comment

Uh oh!

vinser52 left a comment

Uh oh!

byrnedj left a comment

Uh oh!

igchor commented Nov 20, 2023

Uh oh!

vinser52 left a comment

Uh oh!

vinser52 left a comment

Uh oh!

byrnedj left a comment

Uh oh!

byrnedj commented Feb 28, 2024

Uh oh!

igchor commented Mar 4, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Improvements to the background workers #96

Improvements to the background workers #96

Uh oh!

Conversation

byrnedj commented Sep 8, 2023 • edited by rdower Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

igchor left a comment

Choose a reason for hiding this comment

Uh oh!

vinser52 left a comment

Choose a reason for hiding this comment

Uh oh!

vinser52 left a comment

Choose a reason for hiding this comment

Uh oh!

vinser52 left a comment

Choose a reason for hiding this comment

Uh oh!

byrnedj left a comment

Choose a reason for hiding this comment

Uh oh!

igchor commented Nov 20, 2023

Uh oh!

vinser52 left a comment

Choose a reason for hiding this comment

Uh oh!

vinser52 left a comment

Choose a reason for hiding this comment

Uh oh!

byrnedj left a comment

Choose a reason for hiding this comment

Uh oh!

byrnedj commented Feb 28, 2024

Uh oh!

igchor commented Mar 4, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

byrnedj commented Sep 8, 2023 •

edited by rdower

Loading