Skip to content

Conversation

@guptask
Copy link

@guptask guptask commented Dec 15, 2022

This change is Reviewable

@guptask guptask self-assigned this Dec 15, 2022
Copy link

@igchor igchor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 6 files reviewed, 1 unresolved discussion (waiting on @guptask)


cachelib/allocator/memory/CompressedPtr.h line 197 at r1 (raw file):

    TierId tid;
    for (tid = 0; tid < allocators_.size(); tid++) {
      if (allocators_[tid]->isMemoryInAllocator(static_cast<const void*>(uncompressed)))

does it compile? on upstream, allocators_ is not a vector, but just a single pointer, right?
I think you could probably just hardcode isMultiTiered to false in this function for now.

@guptask guptask force-pushed the upstream_cptr branch 2 times, most recently from 3499c83 to 645c333 Compare December 16, 2022 21:01
@guptask guptask marked this pull request as ready for review December 16, 2022 21:03
@guptask
Copy link
Author

guptask commented Dec 16, 2022

cachelib/allocator/memory/CompressedPtr.h line 197 at r1 (raw file):

Previously, igchor (Igor Chorążewicz) wrote…

does it compile? on upstream, allocators_ is not a vector, but just a single pointer, right?
I think you could probably just hardcode isMultiTiered to false in this function for now.

It compiles now. I renamed SingleTierPtrCompressor to PtrCompressor. We can revert to SingleTierPtrCompressor name later with the other changes. This is the least intrusive option in my opinion for merging into meta/main. I took out the AllocatorContainer based PtrCompressor for now.

@guptask guptask requested a review from igchor December 16, 2022 21:06
@guptask guptask changed the title added ability for compressed pointer to use full 32 bits for addressing in single tier mode and use 31 bits for addressing in multi-tier mode [upstream] added ability for compressed pointer to use full 32 bits for addressing in single tier mode and use 31 bits for addressing in multi-tier mode Dec 16, 2022
Copy link

@igchor igchor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 6 files at r1, 3 of 4 files at r2, all commit messages.
Reviewable status: 6 of 7 files reviewed, 3 unresolved discussions (waiting on @guptask)


cachelib/allocator/memory/CompressedPtr.h line 197 at r1 (raw file):

Previously, guptask (Sounak Gupta) wrote…

It compiles now. I renamed SingleTierPtrCompressor to PtrCompressor. We can revert to SingleTierPtrCompressor name later with the other changes. This is the least intrusive option in my opinion for merging into meta/main. I took out the AllocatorContainer based PtrCompressor for now.

Ok, makes sense. I think you can still include the implementation which operates on vector of allocators. Just please makes sure it is not used anywhere (perhaps just add some simple test for it) and it has different name than PtrCompressor, e.g. MultiTierPtrCompressor or something like this. In the previous version, both classes were named the same and that the issue for me.


cachelib/allocator/memory/CompressedPtr.h line 41 at r2 (raw file):

// the 32nd bit only since its value cannot exceed kMaxTiers (2). This leaves
// the remaining (32 - (kNumSlabBits - 6) - 1 bit for tier id) =  15 bits  for
// the  slab  index. Hence we can index 128 GiB of memory in slabs per tier and

128 GiB of memory per tier in mutlti-tier configuration or 256 GiB in single-tier


cachelib/allocator/memory/CompressedPtr.h line 118 at r2 (raw file):

  // Number of bits for the slab index. This will be the top 16 bits of the
  // compressed ptr.
  static constexpr unsigned int kNumSlabIdxBits =

This change is a bit misleading I think. This value will represent number of bits used for slab indexing only for mutli-tier, for single tier it will be this value + 1, right? Maybe just replace this constant with a constexpr function?
static constexpr unsigned inte kNumSlabIdxBits(bool isMultiTiered) { return kNumTierIdxOffset - kNumAllocIdxBits + (!isMultiTiered); }

Then you wan't need to to add 1 in other places if isMultitTiered is false.

Copy link

@igchor igchor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 6 of 7 files reviewed, 3 unresolved discussions (waiting on @guptask)


cachelib/allocator/memory/CompressedPtr.h line 197 at r1 (raw file):

Previously, igchor (Igor Chorążewicz) wrote…

Ok, makes sense. I think you can still include the implementation which operates on vector of allocators. Just please makes sure it is not used anywhere (perhaps just add some simple test for it) and it has different name than PtrCompressor, e.g. MultiTierPtrCompressor or something like this. In the previous version, both classes were named the same and that the issue for me.

Or if you don;t include that implementation, please add some comment or some info in the commit message that it needs to be implemented later.

@guptask
Copy link
Author

guptask commented Dec 19, 2022

cachelib/allocator/memory/CompressedPtr.h line 197 at r1 (raw file):

Previously, igchor (Igor Chorążewicz) wrote…

Or if you don;t include that implementation, please add some comment or some info in the commit message that it needs to be implemented later.

done

Copy link
Author

@guptask guptask left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 5 of 7 files reviewed, 2 unresolved discussions (waiting on @guptask and @igchor)


cachelib/allocator/memory/CompressedPtr.h line 41 at r2 (raw file):

Previously, igchor (Igor Chorążewicz) wrote…

128 GiB of memory per tier in mutlti-tier configuration or 256 GiB in single-tier

Done.


cachelib/allocator/memory/CompressedPtr.h line 118 at r2 (raw file):

Previously, igchor (Igor Chorążewicz) wrote…

This change is a bit misleading I think. This value will represent number of bits used for slab indexing only for mutli-tier, for single tier it will be this value + 1, right? Maybe just replace this constant with a constexpr function?
static constexpr unsigned inte kNumSlabIdxBits(bool isMultiTiered) { return kNumTierIdxOffset - kNumAllocIdxBits + (!isMultiTiered); }

Then you wan't need to to add 1 in other places if isMultitTiered is false.

Done.

@guptask guptask requested a review from igchor December 19, 2022 23:56
Copy link

@igchor igchor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 1 of 4 files at r2, 1 of 1 files at r3, all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @guptask)

Copy link

@igchor igchor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: 7 of 8 files reviewed, all discussions resolved (waiting on @guptask)

Copy link

@byrnedj byrnedj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good - I just have one comment on the tests. I also was able to compile and pass all tests.

Reviewed 1 of 1 files at r4, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @guptask)


cachelib/allocator/memory/tests/MemoryAllocatorTest.cpp line 410 at r4 (raw file):

  }

  ASSERT_EQ(nullptr, m.unCompress(m.compress(nullptr, false), false));

I would recommend adding the following block after this line. This way we can at least say the code is tested. In fact, I tried adding this and it passed.

Code snippet:

   // and also test with multi-tier
   for (const auto& pool : poolAllocs) {
     const auto& allocs = pool.second;
     for (const auto* alloc : allocs) {
       CompressedPtr ptr = m.compress(alloc, true);
       ASSERT_FALSE(ptr.isNull());
       ASSERT_EQ(alloc, m.unCompress(ptr, true));
     }
   }

   ASSERT_EQ(nullptr, m.unCompress(m.compress(nullptr, true), false));

// the slab using a 32 bit representation.
//
template <typename PtrType, typename AllocatorContainer>
class MultiTierPtrCompressor;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need this forward declaration?

};

template <typename PtrType, typename AllocatorContainer>
class MultiTierPtrCompressor {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where this class template is used?

static PtrType compress(uint32_t slabIdx,
uint32_t allocIdx,
bool isMultiTiered,
TierId tid) noexcept {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where we are storing tid? I believe we need a unit test to check compress function

@guptask
Copy link
Author

guptask commented Jan 4, 2023

cachelib/allocator/memory/CompressedPtr.h line 31 at r4 (raw file):

Previously, vinser52 (Sergei Vinogradov) wrote…

why do we need this forward declaration?

you're right. The forward declaration is needed because of the following in CompressedPtr class:

template <typename PtrType, typename AllocatorContainer>
friend class MultiTierPtrCompressor;

But for upstream PR, I took it out. I'm adding it back.

@guptask
Copy link
Author

guptask commented Jan 4, 2023

cachelib/allocator/memory/CompressedPtr.h line 190 at r4 (raw file):

Previously, vinser52 (Sergei Vinogradov) wrote…

where this class template is used?

It won't be used in the upstream PR. This is a foundational PR meant to add the framework. The MultiTierPtrCompressor will be used later. This is what Igor and I discussed.

Copy link

@vinser52 vinser52 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 4 unresolved discussions (waiting on @guptask)


cachelib/allocator/memory/CompressedPtr.h line 190 at r4 (raw file):

Previously, guptask (Sounak Gupta) wrote…

It won't be used in the upstream PR. This is a foundational PR meant to add the framework. The MultiTierPtrCompressor will be used later. This is what Igor and I discussed.

If a class template is not used anywhere the compiler does not even compile it. It means you might even have a compilation error inside such a class template and the compiler will not detect it. Maybe better to upstream this template later when it is actually used? Or (if we want to upstream it in this PR) introduce a unit test for it so that it is compiled by the compiler.

@guptask
Copy link
Author

guptask commented Jan 6, 2023

cachelib/allocator/memory/tests/MemoryAllocatorTest.cpp line 410 at r4 (raw file):

Previously, byrnedj (Daniel Byrne) wrote…

I would recommend adding the following block after this line. This way we can at least say the code is tested. In fact, I tried adding this and it passed.

Added

@guptask
Copy link
Author

guptask commented Jan 6, 2023

cachelib/allocator/memory/CompressedPtr.h line 130 at r4 (raw file):

Previously, vinser52 (Sergei Vinogradov) wrote…

where we are storing tid? I believe we need a unit test to check compress function

The CompressedPtr is backward compatible. The tid part is set to default 0 in CompressedPtr constructor. Also the following use case

bool isMultiTiered = allocators_.size() > 1;
auto cptr = allocators_[tid]->compress(uncompressed, isMultiTiered);
if (isMultiTiered) { // config has multiple tiers
  cptr.setTierId(tid);
}

@guptask
Copy link
Author

guptask commented Jan 6, 2023

fix added. Thanks for posting the code stub Daniel.

@guptask guptask requested a review from byrnedj January 6, 2023 19:09
Copy link

@vinser52 vinser52 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 7 of 8 files reviewed, 5 unresolved discussions (waiting on @byrnedj, @guptask, and @igchor)


cachelib/allocator/memory/tests/MemoryAllocatorTest.cpp line 418 at r5 (raw file):

      MultiTierPtrCompressor ptr = m.compress(alloc, false);
      ASSERT_FALSE(ptr.isNull());
      ASSERT_EQ(alloc, m.unCompress(ptr, false));

Can we add some Unit tests for multitier?

@guptask
Copy link
Author

guptask commented Jan 10, 2023

cachelib/allocator/memory/CompressedPtr.h line 31 at r4 (raw file):

Previously, vinser52 (Sergei Vinogradov) wrote…

I still do not understand. If it is not needed right now by this PR let's remove it.

Done.

@guptask
Copy link
Author

guptask commented Jan 10, 2023

cachelib/allocator/memory/CompressedPtr.h line 130 at r4 (raw file):

Previously, vinser52 (Sergei Vinogradov) wrote…

Yeah, this is exactly my point, as Igor described.

I took out MultiTierPtrCompressor for this PR. The difference between MultiTierPtrCompressor and PtrCompressor is that the 32nd bit can be reserved for tier id. In case of PtrCompressor we don't allow this 32nd bit to be set whereas in case of MultiTierPtrCompressor this 32nd bit is reserved for tier id. That's why the allocators.size() check is made to ensure whether we're dealing with single or multi-tier. In case of single tier, even if you use MultiTierPtrCompressor, it won't allow you to reserve the 32nd bit for tier id. This was done to create a common structure that'll allow all 32 bits in MultiTierPtrCompressor to be used for addressing when config is single tiered.

@guptask
Copy link
Author

guptask commented Jan 10, 2023

cachelib/allocator/memory/tests/MemoryAllocatorTest.cpp line 418 at r5 (raw file):

Previously, vinser52 (Sergei Vinogradov) wrote…

Can we add some Unit tests for multitier?

Taking out MultiTierPtrCompressor and I added a tier test based on Daniel's suggestion above.

@guptask
Copy link
Author

guptask commented Jan 10, 2023

I took out MultiTierPtrCompressor for this PR.

Copy link

@vinser52 vinser52 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 6 of 8 files reviewed, 5 unresolved discussions (waiting on @byrnedj and @igchor)


cachelib/allocator/memory/CompressedPtr.h line 130 at r4 (raw file):

Previously, guptask (Sounak Gupta) wrote…

I took out MultiTierPtrCompressor for this PR. The difference between MultiTierPtrCompressor and PtrCompressor is that the 32nd bit can be reserved for tier id. In case of PtrCompressor we don't allow this 32nd bit to be set whereas in case of MultiTierPtrCompressor this 32nd bit is reserved for tier id. That's why the allocators.size() check is made to ensure whether we're dealing with single or multi-tier. In case of single tier, even if you use MultiTierPtrCompressor, it won't allow you to reserve the 32nd bit for tier id. This was done to create a common structure that'll allow all 32 bits in MultiTierPtrCompressor to be used for addressing when config is single tiered.

Why you added tid input parameter but it is not used anywhere in this function?

Copy link

@vinser52 vinser52 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 6 of 8 files reviewed, 6 unresolved discussions (waiting on @byrnedj, @guptask, and @igchor)


cachelib/allocator/memory/CompressedPtr.h line 32 at r6 (raw file):

// This CompressedPtr makes decompression fast by staying away from division and
// modulo arithmetic and doing those during the compression time. We most  often
// decompress a CompressedPtr than compress a pointer  while creating one.  This

3 extra whitespace should be removed.

@guptask
Copy link
Author

guptask commented Jan 10, 2023

cachelib/allocator/memory/CompressedPtr.h line 130 at r4 (raw file):

Previously, vinser52 (Sergei Vinogradov) wrote…

Why you added tid input parameter but it is not used anywhere in this function?

This was meant to introduce the API change with backward compatibility. The final definition would look like what's the complete change in develop branch:

// Compress the given slabIdx and allocIdx into a 64-bit compressed
// pointer.
static PtrType compress(uint32_t slabIdx, uint32_t allocIdx, bool isMultiTiered, TierId tid) noexcept {
XDCHECK_LE(allocIdx, kAllocIdxMask);
if (!isMultiTiered) {
XDCHECK_LT(slabIdx, (1u << (kNumSlabIdxBits+1)) - 1);
return (slabIdx << kNumAllocIdxBits) + allocIdx;
}
XDCHECK_LT(slabIdx, (1u << kNumSlabIdxBits) - 1);
return (static_cast<uint64_t>(tid) << kNumTierIdxOffset) + (slabIdx << kNumAllocIdxBits) + allocIdx;
}

Do you think this API change introduction is better with rest of tid details (rather than a placeholder in this PR) ?

@guptask
Copy link
Author

guptask commented Jan 10, 2023

cachelib/allocator/memory/CompressedPtr.h line 32 at r6 (raw file):

Previously, vinser52 (Sergei Vinogradov) wrote…

3 extra whitespace should be removed.

Done.

Copy link

@vinser52 vinser52 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 6 of 8 files reviewed, 5 unresolved discussions (waiting on @byrnedj and @igchor)


cachelib/allocator/memory/CompressedPtr.h line 130 at r4 (raw file):

Do you think this API change introduction is better with rest of tid details (rather than a placeholder in this PR) ?
Yes, I do think that full implementation like in develop branch is what we need in this PR. The current placeholder is not completed and I think it is error-prone because in the future we might forget to complete the implementation. And I see no barrier to putting complete implementation in this PR, am I wrong?

@guptask
Copy link
Author

guptask commented Jan 12, 2023

cachelib/allocator/memory/CompressedPtr.h line 130 at r4 (raw file):

Previously, vinser52 (Sergei Vinogradov) wrote…

Do you think this API change introduction is better with rest of tid details (rather than a placeholder in this PR) ?
Yes, I do think that full implementation like in develop branch is what we need in this PR. The current placeholder is not completed and I think it is error-prone because in the future we might forget to complete the implementation. And I see no barrier to putting complete implementation in this PR, am I wrong?

In https://github.com/intel/CacheLib/blob/develop/cachelib/allocator/memory/CompressedPtr.h
(develop branch Compressed Ptr) the PtrCompressor class requires an allocator container to be passed. In case of main branch, the PtrCompressor gets a single allocator. So raising a PR with complete set of changes from CompressedPtr would not work without rest of the multi-tier changes. The idea was to split the work into smaller chunks - that's why I broke up compressed pointer changes into 2 parts. One that can go independently; other with the multi-tier changes.

There are two options the way I see it -

Option 1. compressed pointer goes entirely with the multi-tier changes. That way the unit tests, that are in place in develop, can verify the compressed ptr part. The tid changes would be done everywhere; so compressed pointer code won't need any further change.

Option 2. The way we're doing it now. Create an initial PR that partially introduces API related changes in Compressed Pointer while keeping backward compatibility in main branch. Then the multi-tier PR can include the develop branch compressed ptr commit. Since the expectation is that API changes would already be part of main branch by that point, only the additional changes in Compressed Ptr would need review.

Copy link

@vinser52 vinser52 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 6 of 8 files reviewed, 6 unresolved discussions (waiting on @byrnedj, @guptask, and @igchor)


cachelib/allocator/memory/CompressedPtr.h line 133 at r8 (raw file):

      return (slabIdx << kNumAllocIdxBits) + allocIdx;
    }
    return (static_cast<uint64_t>(tid) << kNumTierIdxOffset) +

now the logic looks good, but another question should it really be static_cast<uint64_t>? Should it be a 32-bit value?

…ng in single tier mode and use 31 bits for addressing in multi-tier mode
Copy link

@vinser52 vinser52 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 6 of 8 files reviewed, 7 unresolved discussions (waiting on @byrnedj, @guptask, and @igchor)


cachelib/allocator/memory/CompressedPtr.h line 157 at r8 (raw file):

  void setTierId(TierId tid) noexcept {
    ptr_ += static_cast<uint64_t>(tid) << kNumTierIdxOffset;

As we discussed, it also should be static_cast<32>

@guptask
Copy link
Author

guptask commented Jan 12, 2023

cachelib/allocator/memory/CompressedPtr.h line 133 at r8 (raw file):

Previously, vinser52 (Sergei Vinogradov) wrote…

now the logic looks good, but another question should it really be static_cast<uint64_t>? Should it be a 32-bit value?

changed to 32 bits.

@guptask
Copy link
Author

guptask commented Jan 12, 2023

cachelib/allocator/memory/CompressedPtr.h line 157 at r8 (raw file):

Previously, vinser52 (Sergei Vinogradov) wrote…

As we discussed, it also should be static_cast<32>

Done.

@guptask guptask requested a review from vinser52 January 12, 2023 17:26
Copy link

@vinser52 vinser52 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 6 of 8 files reviewed, 2 unresolved discussions (waiting on @byrnedj and @igchor)

Copy link

@vinser52 vinser52 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: 6 of 8 files reviewed, 2 unresolved discussions (waiting on @byrnedj and @igchor)

@guptask guptask closed this Jan 17, 2023
byrnedj pushed a commit that referenced this pull request Jul 23, 2023
Disabled test suite allocator-test-AllocatorTypeTest to skip sporadically failing tests.
byrnedj pushed a commit that referenced this pull request Jul 23, 2023
Disabled test suite allocator-test-AllocatorTypeTest to skip sporadically failing tests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants