Skip to content

Conversation

@byrnedj
Copy link

@byrnedj byrnedj commented Aug 25, 2022

This is the first part of what was formerly the background data movement + tier admission PR


This change is Reviewable

Copy link

@vinser52 vinser52 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 24 files at r1.
Reviewable status: 1 of 24 files reviewed, 13 unresolved discussions (waiting on @byrnedj and @vinser52)


cachelib/allocator/Cache.h line 100 at r1 (raw file):

  //
  // @param poolId    The pool id to query
  // @param tierId    The tier of the pool id

Should we extend the comment for the method above (that takes only poolId) that it returns a reference to the top-most tier memory pool?


cachelib/allocator/CacheAllocator.h line 39 at r1 (raw file):

#include <folly/Range.h>
#pragma GCC diagnostic pop

I would keep empty line to avoid merge conflicts in the future.


cachelib/allocator/CacheAllocator-inl.h line 376 at r1 (raw file):

                                             uint32_t creationTime,
                                             uint32_t expiryTime,
                                             bool fromEvictorThread) {

Where the fromEvictorThread variable is used?


cachelib/allocator/CacheAllocator-inl.h line 393 at r1 (raw file):

  void *memory = nullptr;

  if (tid == 0 && config_.acTopTierEvictionWatermark > 0.0

Since we are using some heuristics here, I think a comment is needed which describes the logic.

Why do we try to findEviction first? Even if we have less free space than threshhold I think it is better to try to allocate in this tier rather than having an eviction on the hot path. Or I do not understand the logic. So a comment is really desirable.


cachelib/allocator/CacheAllocator-inl.h line 476 at r1 (raw file):

  if (freePercentage <= config_.minAcAllocationWatermark)
    return defaultTargetTier + 1;

But it might have even less free space... I thought freePercentage is a parameter that BG evictor monitor to decided whether it should evict something from this tier to reach target freePercentage, isn't it?


cachelib/allocator/CacheAllocator-inl.h line 478 at r1 (raw file):

    return defaultTargetTier + 1;

  // TODO: we can even think about creating different allocation classes for PMEM

We should avoid PMEM term.


cachelib/allocator/CacheAllocator-inl.h line 500 at r1 (raw file):

  // TODO: only works for 2 tiers
  return (folly::Random::rand32() % 100) < config_.defaultTierChancePercentage ? defaultTargetTier : defaultTargetTier + 1;

Not sure about folly implementation but frequently rand function has an internal lock and having it on a hot path might brake concurrency


cachelib/allocator/CacheAllocator-inl.h line 1683 at r1 (raw file):

    TierId sourceTierId, TierId targetTierId, PoolId pid, Item& item)
{
  if (config_.disableEvictionToMemory)

it could be return !(config_.disableEvictionToMemory) to avoid branch.


cachelib/allocator/CacheAllocatorConfig.h line 594 at r1 (raw file):

  double acTopTierEvictionWatermark{0.0}; // TODO: make it per TIER?
  uint64_t sizeThresholdPolicy{0};   
  double defaultTierChancePercentage{50.0};

I think new config option should somehow described in the comments.


cachelib/allocator/MemoryTierCacheConfig.h line 72 at r1 (raw file):

  }

    // TODO: move it to MMContainer config

Looks like we need to address this TODO comment. Not sure why it should be per tier.


cachelib/allocator/memory/MemoryAllocator.h line 652 at r1 (raw file):

  }

 // TODO:

what is it?


cachelib/allocator/memory/MemoryAllocatorStats.h line 58 at r1 (raw file):

  }

  constexpr size_t getTotalMemory() const noexcept {

should it be renamed to getTotalAllocMemory?


cachelib/allocator/memory/MemoryPool.h line 311 at r1 (raw file):

  void setNumSlabsAdvised(uint64_t value) { curSlabsAdvised_ = value; }

 // TODO:

Again, we broken incapsulation. Not sure that we have to merge some hacks to develop branch.

Copy link

@vinser52 vinser52 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 1 of 24 files reviewed, 15 unresolved discussions (waiting on @byrnedj and @vinser52)


cachelib/allocator/tests/AllocatorMemoryTiersTest.cpp.orig line 1 at r1 (raw file):

/*

why are these files part of PR?


cachelib/allocator/tests/AllocatorMemoryTiersTest.h.orig line 1 at r1 (raw file):

/*

why are these files part of PR?

igchor and others added 28 commits September 7, 2022 07:58
It's implementation is mostly based on PosixShmSegment.

Also, extend ShmManager and ShmSegmentOpts to support this new
segment type.
After introducing file segment type, nameToKey_ does not provide
enough information to recover/remove segments on restart.

This commit fixes that by replacing nameToKey_ with nameToOpts_.

Previously, the Key from nameToKey_ map was only used in a single
DCHECK().
* New class MemoryTierCacheConfig allows to configure a memory tier.
  Setting tier size and location of a file for file-backed memory are
  supported in this initial implementation;
* New member, vector of memory tiers, is added to class CacheAllocatorConfig.
* New test suite, chelib/allocator/tests/MemoryTiersTest.cpp,
  demonstrates the usage of and tests extended config API.
to allow using new configureMemoryTiers() API with legacy behavior.

Move validation code for memory tiers to validate() method and convert
ratios to sizes lazily (on get)..
It wrongly assumed that the only possible segment type is
PosixSysV segment.
Now it's size is 8 bytes intead of 4.

Original CompressedPtr stored only some offset with a memory Allocator.
For multi-tier implementation, this is not enough. We must also store
tierId and when uncompressing, select a proper allocator.

An alternative could be to just resign from CompressedPtr but they
are leveraged to allow the cache to be mapped to different addresses on shared memory.

Changing CompressedPtr impacted CacheItem size - it increased from 32 to 44 bytes.
Without this fix removeCb called even in case when Item is moved between
tiers.
It fails because CentOS is EOL. We might want to consider
using CentOS Streams but for now, just remove it.

Right now, we rely on build-cachelib-centos workflow anyway.
Disabled test suite allocator-test-AllocatorTypeTest to skip sporadically failing tests.
…m#43)

Compensation results in ratios being different than originially specified.
Return a sum of sizes of each tier instead of just 1st tier's size.
vinser52 and others added 8 commits September 12, 2022 13:16
Initial support of NUMA bindings
The issue was caused by incorrect behaviour of the
CacheAllocator<CacheTrait>::tryEvictToNextMemoryTier method in case the
evicted item is expired. We cannot simply return a handle to it, but we need
to remove it from the access container and MM container.
Fix ReaperSkippingSlabTraversalWhileSlabReleasing test
item movement test

updated class in test

use transparent sync for item movement

remove extra whitespace

updates per comments

updates per comments (still outstanding on the todos)
@byrnedj
Copy link
Author

byrnedj commented Sep 20, 2022

I rebased the PR, the outstanding issues from the review to address are:

cachelib/allocator/MemoryTierCacheConfig.h line 72 at r1 (raw file):

}

// TODO: move it to MMContainer config

Looks like we need to address this TODO comment. Not sure why it should be per tier.

cachelib/allocator/memory/MemoryAllocator.h line 652 at r1 (raw file):

}

// TODO:
what is it?

cachelib/allocator/memory/MemoryAllocatorStats.h line 58 at r1 (raw file):

}

constexpr size_t getTotalMemory() const noexcept {
should it be renamed to getTotalAllocMemory?

cachelib/allocator/memory/MemoryPool.h line 311 at r1 (raw file):

void setNumSlabsAdvised(uint64_t value) { curSlabsAdvised_ = value; }

// TODO:
Again, we broken incapsulation. Not sure that we have to merge some hacks to develop branch.

Copy link

@igchor igchor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 24 files reviewed, 16 unresolved discussions (waiting on @byrnedj, @igchor, and @vinser52)


cachelib/allocator/CacheAllocator-inl.h line 393 at r1 (raw file):

Previously, vinser52 (Sergei Vinogradov) wrote…

Since we are using some heuristics here, I think a comment is needed which describes the logic.

Why do we try to findEviction first? Even if we have less free space than threshhold I think it is better to try to allocate in this tier rather than having an eviction on the hot path. Or I do not understand the logic. So a comment is really desirable.

Yes, by default we should allocate in this tier (hence acTopTierEvictionWatermark is set to 0 in config). However, this condition allows you to change this behavior if you want (which can allow us to test performance of that).


cachelib/allocator/CacheAllocator-inl.h line 476 at r1 (raw file):

Previously, vinser52 (Sergei Vinogradov) wrote…

But it might have even less free space... I thought freePercentage is a parameter that BG evictor monitor to decided whether it should evict something from this tier to reach target freePercentage, isn't it?

Yes, BG monitors it, but we can still use it for comparison here. What do yoy mean by "it might have even less free space"?


cachelib/allocator/CacheAllocator-inl.h line 2508 at r2 (raw file):

            : 0);
    for (TierId tid = 0; tid < getNumTiers(); tid++) {
      if constexpr (std::is_same_v<MMConfig, MMLru::Config> || std::is_same_v<MMConfig, MM2Q::Config>) {

This should probably be fixed (this if was just a temp workaround).
The best way would probably be to move those 2 params to specific MMContainer config. Do we have support for setting MMCOntainer configs from JSON currently? And will it work for multi tier?


cachelib/allocator/memory/MemoryAllocator.h line 652 at r1 (raw file):

Previously, byrnedj (Daniel Byrne) wrote…

}

// TODO:
what is it?

private should be uncommented - if anything breaks, you need to add some extra function to the MemoryAllcoator


cachelib/allocator/memory/MemoryPool.h line 311 at r1 (raw file):

Previously, byrnedj (Daniel Byrne) wrote…

void setNumSlabsAdvised(uint64_t value) { curSlabsAdvised_ = value; }

// TODO:
Again, we broken incapsulation. Not sure that we have to merge some hacks to develop branch.

Agree, private should be uncommented - if anything breaks, you need to add some extra function to the MemoryPool

@byrnedj
Copy link
Author

byrnedj commented Sep 21, 2022

cachelib/allocator/CacheAllocator-inl.h line 2508 at r2 (raw file):

            : 0);
    for (TierId tid = 0; tid < getNumTiers(); tid++) {
      if constexpr (std::is_same_v<MMConfig, MMLru::Config> || std::is_same_v<MMConfig, MM2Q::Config>) {

This should probably be fixed (this if was just a temp workaround). The best way would probably be to move those 2 params to specific MMContainer config. Do we have support for setting MMCOntainer configs from JSON currently? And will it work for multi tier?

For this it is not trivial to add per tier mmContainer configs. Partly because in cachebench/Cache-inl.h, we have the following initialization:

    const size_t numBytes = cache_->getCacheMemoryStats().cacheSize;
    for (uint64_t i = 0; i < config_.numPools; ++i) {
      const double& ratio = config_.poolSizes[i];
      const size_t poolSize = static_cast<size_t>(numBytes * ratio);
      typename Allocator::MMConfig mmConfig =
          makeMMConfig<typename Allocator::MMConfig>(config_);
      const PoolId pid = cache_->addPool(
          folly::sformat("pool_{}", i), poolSize, {} /* allocSizes */, mmConfig,
          nullptr /* rebalanceStrategy */, nullptr /* resizeStrategy */,
          true /* ensureSufficientMem */);
      pools_.push_back(pid);
    }

The call to addPool initializes a pool that is split among tiers, so mmConfig is done per pool rather than per tier. For now I would suggest that we either:

  1. Accept the current implementation or
  2. Drop it from this PR and create a new PR that fully supports per-tier mmContainer creation (allowing us to have different allocators per tier too).

I have addressed the rest of the comments in the latest commit.

Copy link

@vinser52 vinser52 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 23 files at r2.
Reviewable status: 2 of 24 files reviewed, 16 unresolved discussions (waiting on @byrnedj, @igchor, and @vinser52)


cachelib/allocator/CacheAllocator.h line 1291 at r3 (raw file):

  CacheAllocator(InitMemType types, Config config);
  
  TierId getTargetTierForItem(PoolId pid, typename Item::Key key,

I think we need a comment as for other methods in this class to desccribe the logic of this method.


cachelib/allocator/CacheAllocator.h line 1686 at r3 (raw file):

  WriteHandle tryEvictToNextMemoryTier(Item& item);

  bool shouldEvictToNextMemoryTier(TierId sourceTierId,

The same as previous, the comment is needed.


cachelib/allocator/CacheAllocator-inl.h line 393 at r1 (raw file):

Previously, igchor (Igor Chorążewicz) wrote…

Yes, by default we should allocate in this tier (hence acTopTierEvictionWatermark is set to 0 in config). However, this condition allows you to change this behavior if you want (which can allow us to test performance of that).

Ah, got it. Than some comment is nice to have to describe this logic.

BTW, do we have cases when this option improves performance somehow?


cachelib/allocator/CacheAllocator-inl.h line 476 at r1 (raw file):

Previously, igchor (Igor Chorążewicz) wrote…

Yes, BG monitors it, but we can still use it for comparison here. What do yoy mean by "it might have even less free space"?

As I understood, if default tier has no enough space than we return next tier. But what if next tier has even less free space?
Another question: will such logic works for cases with more than two tiers? I understand that today we are limiting number of tiers to 2. But what if the constant will be changed?


cachelib/allocator/CacheAllocator-inl.h line 528 at r3 (raw file):

                                             uint32_t expiryTime) {
  auto tid = getTargetTierForItem(pid, key, size, creationTime, expiryTime);
  return allocateInternalTier(tid, pid, key, size, creationTime, expiryTime);

Is it safe to remove the loop? I can imagine situation when getTargetTierForItem return tier 0. After that concurrent threads fills the tier 0 and make it full. After that our thread will be unable to allocate in tier 0 and return empty handle. I understand it is crazy situation, but what if?


cachelib/allocator/CacheAllocator-inl.h line 2566 at r3 (raw file):

  // TODO - get rid of the duplication - right now, each tier
  // holds pool objects with mostly the same info
  return filterCompactCachePools(allocator_[currentTier()]->getPoolIds());

@igchor should it be really currentTier() instead of always 0? As I remember you added currentTier() to be used for interaction with BG workers. In this case set of pools is the same for each tier (we have duplication) and we can always work with tier 0, isn’t it?


cachelib/allocator/MemoryTierCacheConfig.h line 72 at r1 (raw file):

Previously, byrnedj (Daniel Byrne) wrote…

}

// TODO: move it to MMContainer config

Looks like we need to address this TODO comment. Not sure why it should be per tier.

Not sure that I understand, but we have MMContainers per tier but config for MMcontainers can be the same for each tier. Is my assumption correct?

@byrnedj byrnedj mentioned this pull request Sep 21, 2022
Copy link

@igchor igchor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 2 of 25 files reviewed, 16 unresolved discussions (waiting on @byrnedj, @igchor, and @vinser52)


cachelib/allocator/CacheAllocator-inl.h line 476 at r1 (raw file):

Previously, vinser52 (Sergei Vinogradov) wrote…

As I understood, if default tier has no enough space than we return next tier. But what if next tier has even less free space?
Another question: will such logic works for cases with more than two tiers? I understand that today we are limiting number of tiers to 2. But what if the constant will be changed?

  1. Yes, it might happen that next tier has even less space but at least this option gives you choice - right now we always allocate in topmost (and with this patch, default setting is the same - i.e. maxAcAllocationWatermark is 0). In general, we can try to apply some more dynamic threshold, use hill climbing approach, etc
  2. Right, currently this assumes there are only 2 tiers (some other config options do that as well). I think that if we would like to have this generic then we would have to introduce such allocationWatermarks per tier (or per each boundry between tiers)

cachelib/allocator/CacheAllocator-inl.h line 2508 at r2 (raw file):

Previously, byrnedj (Daniel Byrne) wrote…

For this it is not trivial to add per tier mmContainer configs. Partly because in cachebench/Cache-inl.h, we have the following initialization:

    const size_t numBytes = cache_->getCacheMemoryStats().cacheSize;
    for (uint64_t i = 0; i < config_.numPools; ++i) {
      const double& ratio = config_.poolSizes[i];
      const size_t poolSize = static_cast<size_t>(numBytes * ratio);
      typename Allocator::MMConfig mmConfig =
          makeMMConfig<typename Allocator::MMConfig>(config_);
      const PoolId pid = cache_->addPool(
          folly::sformat("pool_{}", i), poolSize, {} /* allocSizes */, mmConfig,
          nullptr /* rebalanceStrategy */, nullptr /* resizeStrategy */,
          true /* ensureSufficientMem */);
      pools_.push_back(pid);
    }

The call to addPool initializes a pool that is split among tiers, so mmConfig is done per pool rather than per tier. For now I would suggest that we either:

  1. Accept the current implementation or
  2. Drop it from this PR and create a new PR that fully supports per-tier mmContainer creation (allowing us to have different allocators per tier too).

I have addressed the rest of the comments in the latest commit.

I think we could drop the change, it might be worked on separately I belive.


cachelib/allocator/CacheAllocator-inl.h line 2566 at r3 (raw file):

Previously, vinser52 (Sergei Vinogradov) wrote…

@igchor should it be really currentTier() instead of always 0? As I remember you added currentTier() to be used for interaction with BG workers. In this case set of pools is the same for each tier (we have duplication) and we can always work with tier 0, isn’t it?

Right, in this particular place, 0 is OK.


cachelib/allocator/MemoryTierCacheConfig.h line 72 at r1 (raw file):

Previously, vinser52 (Sergei Vinogradov) wrote…

Not sure that I understand, but we have MMContainers per tier but config for MMcontainers can be the same for each tier. Is my assumption correct?

I belive we would prefer to have config for each MMContainer (for each tier separately). This is because we only evict from DRAM but we evict AND promote from the second tier (which means we might want to customize insertion point for second tier, but not neceserilay for DRAM),

Copy link

@igchor igchor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 2 of 25 files reviewed, 16 unresolved discussions (waiting on @byrnedj, @igchor, and @vinser52)


cachelib/allocator/CacheAllocator-inl.h line 393 at r1 (raw file):

Previously, vinser52 (Sergei Vinogradov) wrote…

Ah, got it. Than some comment is nice to have to describe this logic.

BTW, do we have cases when this option improves performance somehow?

That's why I added this - to be able to benchmark to it :)

byrnedj pushed a commit that referenced this pull request Jul 23, 2023
Run long tests (navy/bench) every day on CI
@byrnedj byrnedj force-pushed the develop branch 2 times, most recently from 09d7bab to ebfca17 Compare May 21, 2024 13:24
@byrnedj byrnedj force-pushed the develop branch 4 times, most recently from 9ba4e79 to c1ff397 Compare June 26, 2024 18:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants