Skip to content

Conversation

@igchor
Copy link

@igchor igchor commented Nov 16, 2021

This change is Reviewable

@igchor igchor force-pushed the poc_tiers branch 2 times, most recently from cbd1e14 to 341a94b Compare November 30, 2021 15:10
Copy link
Collaborator

@guptask guptask left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 7 of 8 files at r1, 2 of 2 files at r2, all commit messages.
Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @igchor)


cachelib/allocator/Cache.h, line 261 at r2 (raw file):

  unsigned getNumTiers() const;

  unsigned numTiers;

this should be renamed to numTiers_ or m_numTiers. I'm not sure about the convention to follow since cachelib doesn't seem to adhere strictly to one convention. Also you can assign default value of 1 to this variable even though your CacheBase constructor change does address it.


cachelib/allocator/CacheAllocator.h, line 588 at r2 (raw file):

  //
  // @throw std::invalid_argument if the poolId is invalid
  void overridePoolConfig(TierId tid, PoolId pid, const MMConfig& config);

this is more of a design thought - we can bit pack the tier id and pool id together. It has been used in multiple places. That would make the pool id unique.


cachelib/allocator/CacheAllocator.h, line 1283 at r2 (raw file):

  // @return true  If the move was completed, and the containers were updated
  //               successfully.
  bool moveRegularItemOnEviction(Item& oldItem, ItemHandle& newItemHdl);

just a thought - how much impact does memcpy have here ?

Copy link
Collaborator

@vinser52 vinser52 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 4 unresolved discussions (waiting on @igchor)


cachelib/allocator/CacheAllocator-inl.h, line 1792 at r2 (raw file):

CacheAllocator<CacheTrait>::getTierId(const Item& item) const {
  // TODO: add tierID member to the Item or look at the
  // item.getMemory() and compare it with all the tier allocators

We need to decide how to implement this functionality. There are a lot of places where it is used, but it is not implemented yet.

Copy link
Author

@igchor igchor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 6 of 11 files reviewed, 4 unresolved discussions (waiting on @guptask and @vinser52)


cachelib/allocator/Cache.h, line 261 at r2 (raw file):

Previously, guptask (Sounak Gupta) wrote…

this should be renamed to numTiers_ or m_numTiers. I'm not sure about the convention to follow since cachelib doesn't seem to adhere strictly to one convention. Also you can assign default value of 1 to this variable even though your CacheBase constructor change does address it.

Done.


cachelib/allocator/CacheAllocator.h, line 588 at r2 (raw file):

Previously, guptask (Sounak Gupta) wrote…

this is more of a design thought - we can bit pack the tier id and pool id together. It has been used in multiple places. That would make the pool id unique.

We thought about this but it would make the API somewhat inconsistent - in some places we would use PoolId as both pool and tier descriptor and in others (e.g. in public API) we would only be interested in Pool id.


cachelib/allocator/CacheAllocator.h, line 1283 at r2 (raw file):

Previously, guptask (Sounak Gupta) wrote…

just a thought - how much impact does memcpy have here ?

That's a good question - I don't know right now but it might be useful to collect some perf/vtune data (we just need some workload)


cachelib/allocator/CacheAllocator-inl.h, line 1792 at r2 (raw file):

Previously, vinser52 (Sergei Vinogradov) wrote…

We need to decide how to implement this functionality. There are a lot of places where it is used, but it is not implemented yet.

Done.

Copy link
Collaborator

@vinser52 vinser52 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 4 of 8 files at r1, 1 of 2 files at r2, 2 of 5 files at r3, 4 of 4 files at r4, all commit messages.
Reviewable status: all files reviewed, 16 unresolved discussions (waiting on @guptask and @igchor)


cachelib/allocator/CacheAllocator.h, line 629 at r4 (raw file):

  //          smaller than _bytes_
  // @throw   std::invalid_argument if the poolId is invalid.
  // TODO: should call shrinkPool for specific tier?

Is this TODO still relevant? We call currentTier() inside implementation. Another question can this method be called by user? If so, it is a possible issue because the user should not set currentTier.


cachelib/allocator/CacheAllocator.h, line 640 at r4 (raw file):

  //            bytes were not available.
  // @throw     std::invalid_argument if the poolId is invalid.
  // TODO: should call growPool for specific tier?

The same comment as above


cachelib/allocator/CacheAllocator.h, line 1609 at r4 (raw file):

    // detection.
    folly::annotate_ignore_thread_sanitizer_guard g(__FILE__, __LINE__);
    allocator_[0 /* TODO */]->forEachAllocation(std::forward<Fn>(f));

Should we fix this TODO.


cachelib/allocator/CacheAllocator.h, line 1676 at r4 (raw file):

  typename Item::PtrCompressor createPtrCompressor() const {
    return allocator_[0 /* TODO */]->createPtrCompressor<Item>();

Should we fix this TODO.


cachelib/allocator/CacheAllocator.h, line 1754 at r4 (raw file):

  TierId currentTier() const {
    return 0; // TODO

I thoughts you already implemented this function.


cachelib/allocator/CacheAllocator-inl.h, line 1476 at r4 (raw file):

typename CacheAllocator<CacheTrait>::ItemHandle
CacheAllocator<CacheTrait>::advanceIteratorAndTryEvictRegularItem(
    TierId tid, PoolId pid, MMContainer& sourceMMContainer, EvictionIterator& itr) {

why we renamed mmContainer -> sourceMMContainer


cachelib/allocator/CacheAllocator-inl.h, line 1973 at r4 (raw file):

typename CacheAllocator<CacheTrait>::ItemHandle
CacheAllocator<CacheTrait>::getSampleItem() {
  // TODO: is using random tier a good idea?

I think it is OK since this method is called only by test. But let's keep the TODO comment to discuss it with Facebook folks.


cachelib/allocator/CacheAllocator-inl.h, line 2200 at r4 (raw file):

  PoolId pid = 0;
  for (TierId tid = 0; tid < numTiers_; tid++) {
    // TODO: adjust size per tier

With Victoria's changes we should be able t adjust the size based on config params, isn't it?


cachelib/allocator/CacheAllocator-inl.h, line 3169 at r4 (raw file):

  auto serializeMMContainers = [](MMContainers& mmContainers) {
    MMSerializationTypeContainer state;
    for (unsigned int i = 0; i < mmContainers[0 /* TODO */].size(); ++i) {

Should we do it for each tier?


cachelib/allocator/CacheAllocator-inl.h, line 3189 at r4 (raw file):

  AccessSerializationType accessContainerState = accessContainer_->saveState();
  // TODO: foreach allocator

Why we haven't implemented it?


cachelib/allocator/CacheAllocator-inl.h, line 3253 at r4 (raw file):

  shmManager_.reset();

  // TODO: save per-tier state

The same comment as above.


cachelib/allocator/CacheAllocator-inl.h, line 3660 at r4 (raw file):

      ShmManager::cleanup(cacheDir, posix);

      // TODO: cleanup per-tier state

Should we implement it?


cachelib/allocator/memory/Slab.h, line 56 at r4 (raw file):

// it has a different type than PoolId and ClassID
// to allow overloads
using TierId = uint8_t;

Since we use your trick with currentTier(), do we need the different types for TierId and PoolId

Copy link
Author

@igchor igchor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 10 of 12 files reviewed, 16 unresolved discussions (waiting on @guptask, @igchor, and @vinser52)


cachelib/allocator/CacheAllocator.h, line 629 at r4 (raw file):

Previously, vinser52 (Sergei Vinogradov) wrote…

Is this TODO still relevant? We call currentTier() inside implementation. Another question can this method be called by user? If so, it is a possible issue because the user should not set currentTier.

Yeah, this can be called by the user so I think this is still relevant - should this function shrink pool on specific tier or on all tiers at once?


cachelib/allocator/CacheAllocator.h, line 640 at r4 (raw file):

Previously, vinser52 (Sergei Vinogradov) wrote…

The same comment as above

.


cachelib/allocator/CacheAllocator.h, line 1609 at r4 (raw file):

Previously, vinser52 (Sergei Vinogradov) wrote…

Should we fix this TODO.

Done.


cachelib/allocator/CacheAllocator.h, line 1754 at r4 (raw file):

Previously, vinser52 (Sergei Vinogradov) wrote…

I thoughts you already implemented this function.

So, this function is used for methods which can be called by a user/background workers where use/bg worker must have per-tier knowledge. This is the problem I raised some time ago: how we can expose the statistics/operations for different tiers.

There is no good answer for that right now, so for this PR I have not implemented this correctly (this is not really needed for benchmarks, unless we are relying on bg workers).

I have this hack implementation which allows setting currentTier by user/bg worker but this is not needed for corectness for now (at least I think so)


cachelib/allocator/CacheAllocator-inl.h, line 1476 at r4 (raw file):

Previously, vinser52 (Sergei Vinogradov) wrote…

why we renamed mmContainer -> sourceMMContainer

Done. It was remeinder of the previous implementation.


cachelib/allocator/CacheAllocator-inl.h, line 1973 at r4 (raw file):

Previously, vinser52 (Sergei Vinogradov) wrote…

I think it is OK since this method is called only by test. But let's keep the TODO comment to discuss it with Facebook folks.

Ok


cachelib/allocator/CacheAllocator-inl.h, line 2200 at r4 (raw file):

Previously, vinser52 (Sergei Vinogradov) wrote…

With Victoria's changes we should be able t adjust the size based on config params, isn't it?

Done.


cachelib/allocator/CacheAllocator-inl.h, line 3169 at r4 (raw file):

Previously, vinser52 (Sergei Vinogradov) wrote…

Should we do it for each tier?

Yes, most probably, but this is not so critical I believe - this is for serialization.


cachelib/allocator/CacheAllocator-inl.h, line 3189 at r4 (raw file):

Previously, vinser52 (Sergei Vinogradov) wrote…

Why we haven't implemented it?

I skipped the things which are not so important - like serialization


cachelib/allocator/CacheAllocator-inl.h, line 3253 at r4 (raw file):

Previously, vinser52 (Sergei Vinogradov) wrote…

The same comment as above.

.


cachelib/allocator/CacheAllocator-inl.h, line 3660 at r4 (raw file):

Previously, vinser52 (Sergei Vinogradov) wrote…

Should we implement it?

.


cachelib/allocator/memory/Slab.h, line 56 at r4 (raw file):

Previously, vinser52 (Sergei Vinogradov) wrote…

Since we use your trick with currentTier(), do we need the different types for TierId and PoolId

Why?

Copy link
Collaborator

@vinser52 vinser52 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 2 files at r5, all commit messages.
Reviewable status: all files reviewed, 9 unresolved discussions (waiting on @guptask and @igchor)


cachelib/allocator/CacheAllocator.h, line 629 at r4 (raw file):

Previously, igchor (Igor Chorążewicz) wrote…

Yeah, this can be called by the user so I think this is still relevant - should this function shrink pool on specific tier or on all tiers at once?

I would say that this function should shrink all tiers according to the ratio defined by tiers config. There are two reasons I have in mind:

  1. This function shrinks the pool and pool logically spread across tiers.
  2. In case we want to shrink a particular tier caller (user or background thread) should specify tierId explicitly. For that purpose, we need a separate overload.

If we keep it in such a way just for benchmarking purposes and going to implement overloads for per-tier shrinking in the future I am fine.


cachelib/allocator/CacheAllocator.h, line 640 at r4 (raw file):

Previously, igchor (Igor Chorążewicz) wrote…

.

The same as above


cachelib/allocator/CacheAllocator.h, line 1754 at r4 (raw file):

Previously, igchor (Igor Chorążewicz) wrote…

So, this function is used for methods which can be called by a user/background workers where use/bg worker must have per-tier knowledge. This is the problem I raised some time ago: how we can expose the statistics/operations for different tiers.

There is no good answer for that right now, so for this PR I have not implemented this correctly (this is not really needed for benchmarks, unless we are relying on bg workers).

I have this hack implementation which allows setting currentTier by user/bg worker but this is not needed for corectness for now (at least I think so)

I got your point that for benchmarking purpose the BG workers are not important. But what you mean under "his function is used for methods which can be called by a user" my understanding think that if user calls some method it should be applied to all tiers or user have to specify tier explicitly.


cachelib/allocator/CacheAllocator-inl.h, line 3169 at r4 (raw file):

Previously, igchor (Igor Chorążewicz) wrote…

Yes, most probably, but this is not so critical I believe - this is for serialization.

Got it, let's add TODO comment at least/


cachelib/allocator/memory/Slab.h, line 56 at r4 (raw file):

Previously, igchor (Igor Chorążewicz) wrote…

Why?

maybe I do not understand which overloads did you mean. Can you provide an example?

Copy link
Author

@igchor igchor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 9 of 18 files reviewed, 9 unresolved discussions (waiting on @guptask, @igchor, and @vinser52)


cachelib/allocator/CacheAllocator.h, line 1676 at r4 (raw file):

Previously, vinser52 (Sergei Vinogradov) wrote…

Should we fix this TODO.

Done. (kind of, this is very hacky)


cachelib/allocator/CacheAllocator.h, line 1754 at r4 (raw file):

Previously, vinser52 (Sergei Vinogradov) wrote…

I got your point that for benchmarking purpose the BG workers are not important. But what you mean under "his function is used for methods which can be called by a user" my understanding think that if user calls some method it should be applied to all tiers or user have to specify tier explicitly.

Yes, that's why there this TODO - basically each method which uses this function needs to be either changed to work on all tiers or accept additional tierId parameter.

Copy link
Author

@igchor igchor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 7 of 18 files reviewed, 8 unresolved discussions (waiting on @guptask and @vinser52)


cachelib/allocator/CacheAllocator.h, line 629 at r4 (raw file):

Previously, vinser52 (Sergei Vinogradov) wrote…

I would say that this function should shrink all tiers according to the ratio defined by tiers config. There are two reasons I have in mind:

  1. This function shrinks the pool and pool logically spread across tiers.
  2. In case we want to shrink a particular tier caller (user or background thread) should specify tierId explicitly. For that purpose, we need a separate overload.

If we keep it in such a way just for benchmarking purposes and going to implement overloads for per-tier shrinking in the future I am fine.

That's the plan - I extended comment in currentTier()


cachelib/allocator/CacheAllocator-inl.h, line 3169 at r4 (raw file):

Previously, vinser52 (Sergei Vinogradov) wrote…

Got it, let's add TODO comment at least/

Done.


cachelib/allocator/memory/Slab.h, line 56 at r4 (raw file):

Previously, vinser52 (Sergei Vinogradov) wrote…

maybe I do not understand which overloads did you mean. Can you provide an example?

Sorry, I forgot I put this comment there ;d Actually you are right, it is not necessary to have different types anymore. It;s only needed where we would have a function like get_stats(PoolId, TierId tid); get_stats(PoolId, SlabId slabId) or something.

Copy link
Collaborator

@vinser52 vinser52 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 8 files at r1, 2 of 11 files at r7, 9 of 9 files at r8, all commit messages.
Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @guptask)

@vinser52
Copy link
Collaborator

I am confused that the build still failed.

@igchor igchor force-pushed the poc_tiers branch 2 times, most recently from 86ecd09 to f2d5240 Compare December 14, 2021 15:34
Copy link
Collaborator

@vinser52 vinser52 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 2 files at r9, all commit messages.
Reviewable status: all files reviewed, 3 unresolved discussions (waiting on @guptask)

@igchor igchor marked this pull request as ready for review December 14, 2021 15:53
@igchor igchor force-pushed the poc_tiers branch 6 times, most recently from aa9a5c1 to 9ed5abc Compare December 20, 2021 14:17
@igchor
Copy link
Author

igchor commented Dec 20, 2021

All changes integrated into: #21

@igchor igchor closed this Dec 20, 2021
Now it's size is 8 bytes intead of 4.

Original CompressedPtr stored only some offset with a memory Allocator.
For multi-tier implementation, this is not enough. We must also store
tierId and when uncompressing, select a proper allocator.

An alternative could be to just resign from CompressedPtr but they
are leveraged to allow the cache to be mapped to different addresses on shared memory.

Changing CompressedPtr impacted CacheItem size - it increased from 32 to 44 bytes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants