Skip to content

Conversation

@npitre
Copy link

@npitre npitre commented Apr 9, 2020

This goes on top of PR #23941 and therefore patches from PR #23941 are included here.

This series implements an assortment of optimizations providing both
execution efficiency and footprint overhead reduction. Because of that, a few tests presuming a higher overhead are now failing.

I'm posting this here for people to see and comment.

@zephyrbot zephyrbot added area: API Changes to public APIs area: Tests Issues related to a particular existing or missing test area: Kernel labels Apr 9, 2020
@zephyrbot
Copy link

zephyrbot commented Apr 9, 2020

All checks are passing now.

checkpatch (informational only, not a failure)

-:126: WARNING:LONG_LINE: line over 80 characters
#126: FILE: lib/os/heap-validate.c:105:
+	for (c = right_chunk(h, 0); c <= max_chunkid(h); c = right_chunk(h, c)) {

-:156: WARNING:LONG_LINE: line over 80 characters
#156: FILE: lib/os/heap-validate.c:135:
+		for (c = c0; n == 0 || c != c0; n++, c = next_free_chunk(h, c)) {

-:170: WARNING:LONG_LINE: line over 80 characters
#170: FILE: lib/os/heap-validate.c:146:
+	for (c = right_chunk(h, 0); c <= max_chunkid(h); c = right_chunk(h, c)) {

-:328: WARNING:LONG_LINE: line over 80 characters
#328: FILE: lib/os/heap.c:124:
+		 "corrupted heap bounds (buffer overflow?) for memory at %p", mem);

-:389: WARNING:LINE_SPACING: Missing a blank line after declarations
#389: FILE: lib/os/heap.c:182:
+		int i = CONFIG_SYS_HEAP_ALLOC_LOOPS;
+		do {

-:487: WARNING:NEW_TYPEDEFS: do not add new typedefs
#487: FILE: lib/os/heap.h:52:
+typedef struct { char bytes[CHUNK_UNIT]; } chunk_unit_t;

-:685: WARNING:LINE_SPACING: Missing a blank line after declarations
#685: FILE: lib/os/heap.h:224:
+	size_t usable_sz = sz - min_chunk_size(h) + 1;
+	return 31 - __builtin_clz(usable_sz);

- total: 0 errors, 7 warnings, 654 lines checked

NOTE: For some of the reported defects, checkpatch may be able to
      mechanically convert to the typical style using --fix or --fix-inplace.

Your patch has style problems, please review.

NOTE: Ignored message types: AVOID_EXTERNS BRACES CONFIG_EXPERIMENTAL CONST_STRUCT DATE_TIME FILE_PATH_CHANGES MINMAX NETWORKING_BLOCK_COMMENT_STYLE PREFER_KERNEL_TYPES PRINTK_WITHOUT_KERN_LEVEL SPDX_LICENSE_TAG SPLIT_STRING VOLATILE

NOTE: If any of the errors are false positives, please report
      them to the maintainers.

Tip: The bot edits this comment instead of posting a new one, so you can check the comment's history to see earlier messages.

@andrewboie
Copy link
Contributor

parent PR now merged, please rebase

@npitre
Copy link
Author

npitre commented Apr 14, 2020 via email

@andrewboie
Copy link
Contributor

Bunch of tests failing with " Assertion failed at WEST_TOPDIR/zephyr/tests/kernel/lifo/lifo_usage/src/main.c:329: test_timeout_lifo_thread: packet != NULL is false"

Gonna kick CI again to make sure.

@andrewboie andrewboie closed this Apr 28, 2020
@andrewboie andrewboie reopened this Apr 28, 2020
@andrewboie
Copy link
Contributor

Seeing a consistent assertion failure:

starting test - test_mheap_block_desc

    Assertion failed at WEST_TOPDIR/zephyr/tests/kernel/mem_heap/mheap_api_concept/src/test_mheap_concept.c:129: test_mheap_block_desc: block_fail is not NULL

kicking again to make sure

@npitre
Copy link
Author

npitre commented May 12, 2020 via email

@andrewboie
Copy link
Contributor

[0mSeaBIOS (version rel-1.12.1-0-ga5cab58-dirty-20200214_052440-f7294c49af13-zephyr
)
Booting from ROM..Optimal CONFIG_X86_MMU_PAGE_POOL_PAGES 8
*** Booting Zephyr OS build v2.3.0-rc1-25-g401d2147465a  ***
Running test suite mempool
===================================================================
starting test - test_pool_block_get

    Assertion failed at WEST_TOPDIR/zephyr/tests/kernel/mem_pool/mem_pool/src/main.c:178: pool_block_get_work: (rv not equal to tests[i].rcode)
k_mem_pool_alloc() expected 0, got -12
size: 512, timeout: 0

FAIL - test_pool_block_get
===================================================================
starting test - test_pool_block_get_timeout

    Assertion failed at WEST_TOPDIR/zephyr/tests/kernel/mem_pool/mem_pool/src/main.c:178: pool_block_get_work: (rv not equal to tests[i].rcode)
k_mem_pool_alloc() expected 0, got -11
size: 4096, timeout: 100

FAIL - test_pool_block_get_timeout
===================================================================
starting test - test_pool_block_get_wait

    Assertion failed at WEST_TOPDIR/zephyr/tests/kernel/mem_pool/mem_pool/src/main.c:271: test_pool_block_get_wait: (rv not equal to 0)
k_mem_pool_alloc(3000) expected 0, got -11

FAIL - test_pool_block_get_wait
===================================================================
starting test - test_pool_malloc
PASS - test_pool_malloc
===================================================================
Test suite mempool failed.
===================================================================
PROJECT EXECUTION FAILED

Still failing.

@andrewboie andrewboie added this to the v2.4.0 milestone May 12, 2020
Copy link
Contributor

@andyross andyross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just throwing up my +1 for when this exits the CI banging phase. These all look great to me.

@npitre
Copy link
Author

npitre commented May 12, 2020 via email

First, some renames to make accessors more explicit:

  size() --> chunk_size()
  used() --> chunk_used()
  free_prev() --> prev_free_chunk()
  free_next() --> next_free_chunk()

Then, the return type of chunk_size() is changed from chunkid_t to
size_t, and chunk_used() from chunkid_t to bool.

The left_size() accessor is used only once and can be easily substituted
by left_chunk(), so it is removed.

And in free_list_add() the variable b is renamed to bi so to be
consistent with usage in sys_heap_alloc().

Signed-off-by: Nicolas Pitre <[email protected]>
Nicolas Pitre added 8 commits June 8, 2020 14:43
Let's provide accessors for getting and setting every field to make the
chunk header layout abstracted away from the main code. Those are:

SIZE_AND_USED: chunk_used(), chunk_size(), set_chunk_used() and
chunk_size().

LEFT_SIZE: left_chunk() and set_left_chunk_size().

FREE_PREV: prev_free_chunk() and set_prev_free_chunk().

FREE_NEXT: next_free_chunk() and set_next_free_chunk().

To be consistent, the former chunk_set_used() is now set_chunk_used().

Signed-off-by: Nicolas Pitre <[email protected]>
By storing the used flag in the LSB, it is no longer necessary to have
a size_mask variable to locate that flag. This produces smaller and
faster code.

Replace the validation check in chunk_set() to base it on the storage
type.

Also clarify the semantics of set_chunk_size() which allows for clearing
the used flag bit unconditionally which simplifies the code further.

The idea of moving the used flag bit into the LEFT_SIZE field was
raised. It turns out that this isn't as beneficial as it may seem
because the used bit is set only once i.e. when the memory is handed off
to a user and the size field becomes frozen at that point. Modifications
on the leftward chunk may still occur and extra instructions to preserve
that bit would be necessary if it were moved there.

Signed-off-by: Nicolas Pitre <[email protected]>
It is possible to remove a few fields from struct z_heap, removing
some runtime indirections by doing so:

- The buf pointer is actually the same as the struct z_heap pointer
  itself. So let's simply create chunk_buf() that perform a type
  conversion. That type is also chunk_unit_t now rather than u64_t so
  it can be defined based on CHUNK_UNIT.

- Replace the struct z_heap_bucket pointer by a zero-sized array at the
  end of struct z_heap.

- Make chunk #0 into an actual chunk with its own header. This allows
  for removing the chunk0 field and streamlining the code. This way
  h->chunk0 becomes right_chunk(h, 0). This sets the table for further
  simplifications to come.

Signed-off-by: Nicolas Pitre <[email protected]>
With this we can remove magic constants, especially those used with
big_heap().

Signed-off-by: Nicolas Pitre <[email protected]>
We already have chunk #0 containing our struct z_heap and marked as
used. We can add a partial chunk at the very end that is also marked
as used. By doing so there is no longer a need for checking heap
boundaries at run time when merging/splitting chunks meaning fewer
conditionals in the code's hot path.

Signed-off-by: Nicolas Pitre <[email protected]>
Avoid redundancy and bucket_idx() usage when possible.

Signed-off-by: Nicolas Pitre <[email protected]>
Make the LEFT_SIZE field first and SIZE_AND_USED field last (for an
allocated chunk) so they sit right next to the allocated memory. The
current chunk's SIZE_AND_USED field points to the next (right) chunk,
and from there the LEFT_SIZE field should point back to the current
chunk. Many trivial memory overflows should trip that test.

One way to make this test more robust could involve xor'ing the values
within respective accessor pairs. But at least the fact that the size
value is shifted by one bit already prevent fooling the test with a
same-byte corruption.

Signed-off-by: Nicolas Pitre <[email protected]>
This struct is taking up most of the heap's constant footprint overhead.
We can easily get rid of the list_size member as it is mostly used to
determine if the list is empty, and that can be determined through
other means.

Signed-off-by: Nicolas Pitre <[email protected]>
@carlescufi
Copy link
Member

There are some build failures here that I am not sure are attributable to this PR's changes. @nashif any ideas?

@carlescufi
Copy link
Member

The current failure in qemu_arc_em is almost certainly unrelated to this PR.

Failures

 Sanitycheck / qemu_arc_em:tests/kernel/lifo/lifo_usage/kernel.lifo.usage / tests/kernel/lifo/lifo_usage/kernel.lifo.usage

*** Booting Zephyr OS build zephyr-v2.3.0-488-g6c7ca9602316  ***
Running test suite test_lifo_usage
===================================================================
starting test - test_lifo_nowait
PASS - test_lifo_nowait
===================================================================
starting test - test_lifo_wait
PASS - test_lifo_wait
===================================================================
starting test - test_timeout_empty_lifo
PASS - test_timeout_empty_lifo
===================================================================
starting test - test_timeout_non_empty_lifo
PASS - test_timeout_non_empty_lifo
===================================================================
starting test - test_timeout_lifo_thread
PASS - test_timeout_lifo_thread
===================================================================
starting test - test_timeout_threads_pend_on_lifo
 thread (q order: 2, t/o: 0, lifo 0x8040621c)

    Assertion failed at WEST_TOPDIR/zephyr/tests/kernel/lifo/lifo_usage/src/main.c:140: test_multiple_threads_pending: (data->timeout_order not equal to ii)
 *** thread 2 woke up, expected 1

FAIL - test_timeout_threads_pend_on_lifo
===================================================================
Test suite test_lifo_usage failed.
===================================================================
PROJECT EXECUTION FAILED

@stephanosio
Copy link
Member

stephanosio commented Jun 21, 2020

The current failure in qemu_arc_em is almost certainly unrelated to this PR.

@carlescufi The QEMU ARC platform does not have icount enabled and can be quite unstable when running sanitychecks (basically the same issue we had with other platforms earlier). Re-triggering should fix the problem.

EDIT: This is not an intermittent failure. I just locally ran this test and it consistently fails (both on the latest master and in this PR). It looks like this test is actually broken (see #26163, disabled in #26328).

@carlescufi carlescufi merged commit ad59e92 into zephyrproject-rtos:master Jun 21, 2020
@npitre npitre deleted the sys-heap-2_np branch February 11, 2021 00:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: API Changes to public APIs area: Kernel area: Tests Issues related to a particular existing or missing test

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants