-
Notifications
You must be signed in to change notification settings - Fork 8.2k
sys_heap heap allocator optimizations #24249
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
All checks are passing now. checkpatch (informational only, not a failure)Tip: The bot edits this comment instead of posting a new one, so you can check the comment's history to see earlier messages. |
|
parent PR now merged, please rebase |
|
On Tue, 14 Apr 2020, Andrew Boie wrote:
parent PR now merged, please rebase
Done.
|
|
Bunch of tests failing with " Assertion failed at WEST_TOPDIR/zephyr/tests/kernel/lifo/lifo_usage/src/main.c:329: test_timeout_lifo_thread: packet != NULL is false" Gonna kick CI again to make sure. |
|
Seeing a consistent assertion failure: kicking again to make sure |
|
On Mon, 11 May 2020, Andrew Boie wrote:
Seeing a consistent assertion failure:
```
starting test - test_mheap_block_desc
Assertion failed at WEST_TOPDIR/zephyr/tests/kernel/mem_heap/mheap_api_concept/src/test_mheap_concept.c:129: test_mheap_block_desc: block_fail is not NULL
```
OK, found it.
The last patch is titled "sys_heap: reduce the size of struct
z_heap_bucket by half". It is missing the following bit to a file that
didn't exist when the patch was initially created:
```
diff --git a/include/mempool_heap.h b/include/mempool_heap.h
index df64322b50..439371556d 100644
--- a/include/mempool_heap.h
+++ b/include/mempool_heap.h
@@ -35,7 +35,7 @@ struct k_mem_pool {
* that k_heap does not. We make space for the number of maximum
* objects defined, and include extra so there's enough metadata space
* available for the maximum number of minimum-sized objects to be
- * stored: 8 bytes for each desired chunk header, and a 24 word block
+ * stored: 8 bytes for each desired chunk header, and a 12 word block
* to reserve room for a "typical" set of bucket list heads (this size
* was picked more to conform with existing test expectations than any
* rigorous theory -- we have tests that rely on being able to
@@ -46,7 +46,7 @@ struct k_mem_pool {
K_HEAP_DEFINE(poolheap_##name, \
((maxsz) * (nmax)) \
+ 8 * ((maxsz) * (nmax) / (minsz)) \
- + 24 * sizeof(void *)); \
+ + 12 * sizeof(void *)); \
struct k_mem_pool name = { \
.heap = &poolheap_##name \
}
```
Should be fine now.
|
Still failing. |
andyross
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just throwing up my +1 for when this exits the CI banging phase. These all look great to me.
|
On Tue, 12 May 2020, Andy Ross wrote:
Just throwing up my +1 for when this exits the CI banging phase. These all look great to me.
Thanks.
What is not great are all those tests: some of them are pure bollocks in
the context of the heap allocator.
Some patches in this series reduce the heap footprint overhead and then
some allocation tests pass when they were expected to fail with ENOMEM.
So futzing with the fake numbers within Z_MEM_POOL_DEFINE() make them
pass but then other tests start failing for the opposite reason.
So I just picked another semi-random number to satisfy all tests...
unless I missed some.
|
First, some renames to make accessors more explicit: size() --> chunk_size() used() --> chunk_used() free_prev() --> prev_free_chunk() free_next() --> next_free_chunk() Then, the return type of chunk_size() is changed from chunkid_t to size_t, and chunk_used() from chunkid_t to bool. The left_size() accessor is used only once and can be easily substituted by left_chunk(), so it is removed. And in free_list_add() the variable b is renamed to bi so to be consistent with usage in sys_heap_alloc(). Signed-off-by: Nicolas Pitre <[email protected]>
Let's provide accessors for getting and setting every field to make the chunk header layout abstracted away from the main code. Those are: SIZE_AND_USED: chunk_used(), chunk_size(), set_chunk_used() and chunk_size(). LEFT_SIZE: left_chunk() and set_left_chunk_size(). FREE_PREV: prev_free_chunk() and set_prev_free_chunk(). FREE_NEXT: next_free_chunk() and set_next_free_chunk(). To be consistent, the former chunk_set_used() is now set_chunk_used(). Signed-off-by: Nicolas Pitre <[email protected]>
By storing the used flag in the LSB, it is no longer necessary to have a size_mask variable to locate that flag. This produces smaller and faster code. Replace the validation check in chunk_set() to base it on the storage type. Also clarify the semantics of set_chunk_size() which allows for clearing the used flag bit unconditionally which simplifies the code further. The idea of moving the used flag bit into the LEFT_SIZE field was raised. It turns out that this isn't as beneficial as it may seem because the used bit is set only once i.e. when the memory is handed off to a user and the size field becomes frozen at that point. Modifications on the leftward chunk may still occur and extra instructions to preserve that bit would be necessary if it were moved there. Signed-off-by: Nicolas Pitre <[email protected]>
It is possible to remove a few fields from struct z_heap, removing some runtime indirections by doing so: - The buf pointer is actually the same as the struct z_heap pointer itself. So let's simply create chunk_buf() that perform a type conversion. That type is also chunk_unit_t now rather than u64_t so it can be defined based on CHUNK_UNIT. - Replace the struct z_heap_bucket pointer by a zero-sized array at the end of struct z_heap. - Make chunk #0 into an actual chunk with its own header. This allows for removing the chunk0 field and streamlining the code. This way h->chunk0 becomes right_chunk(h, 0). This sets the table for further simplifications to come. Signed-off-by: Nicolas Pitre <[email protected]>
With this we can remove magic constants, especially those used with big_heap(). Signed-off-by: Nicolas Pitre <[email protected]>
We already have chunk #0 containing our struct z_heap and marked as used. We can add a partial chunk at the very end that is also marked as used. By doing so there is no longer a need for checking heap boundaries at run time when merging/splitting chunks meaning fewer conditionals in the code's hot path. Signed-off-by: Nicolas Pitre <[email protected]>
Avoid redundancy and bucket_idx() usage when possible. Signed-off-by: Nicolas Pitre <[email protected]>
Make the LEFT_SIZE field first and SIZE_AND_USED field last (for an allocated chunk) so they sit right next to the allocated memory. The current chunk's SIZE_AND_USED field points to the next (right) chunk, and from there the LEFT_SIZE field should point back to the current chunk. Many trivial memory overflows should trip that test. One way to make this test more robust could involve xor'ing the values within respective accessor pairs. But at least the fact that the size value is shifted by one bit already prevent fooling the test with a same-byte corruption. Signed-off-by: Nicolas Pitre <[email protected]>
This struct is taking up most of the heap's constant footprint overhead. We can easily get rid of the list_size member as it is mostly used to determine if the list is empty, and that can be determined through other means. Signed-off-by: Nicolas Pitre <[email protected]>
|
There are some build failures here that I am not sure are attributable to this PR's changes. @nashif any ideas? |
|
The current failure in |
@carlescufi The QEMU ARC platform does not have icount enabled and can be quite unstable when running sanitychecks (basically the same issue we had with other platforms earlier). Re-triggering should fix the problem. EDIT: This is not an intermittent failure. I just locally ran this test and it consistently fails (both on the latest master and in this PR). It looks like this test is actually broken (see #26163, disabled in #26328). |
This goes on top of PR #23941 and therefore patches from PR #23941 are included here.
This series implements an assortment of optimizations providing both
execution efficiency and footprint overhead reduction. Because of that, a few tests presuming a higher overhead are now failing.
I'm posting this here for people to see and comment.