Skip to content

Conversation

@nashif
Copy link
Member

@nashif nashif commented Jun 7, 2019

Introduce runtime error checking in the kernel where it make sense and remove duplication of asserts and oopses that were used to catch such errors. Propagate errors where needed.

Ideally we want to use this to check for parameters and permissions and return to the caller to allow recovery, which was not possible in the current approach: when asserts are enabled (not default) we would just crash on every error, even if recoverable or we will get undefined behavior if asserts were not enabled.

Situations which should never happen and disallowed by design, such as running something in an ISR when it should not, remain as asserts where applicable.


(Edit by @aescolar) Discussed in #17105

@nashif nashif added the DNM This PR should not be merged (Do Not Merge) label Jun 7, 2019
@zephyrbot zephyrbot added area: API Changes to public APIs area: Tests Issues related to a particular existing or missing test area: Kernel labels Jun 7, 2019
@zephyrbot
Copy link

zephyrbot commented Jun 7, 2019

All checks are passing now.

Review history of this comment for details about previous failed status.
Note that some checks might have not completed yet.

@nashif nashif force-pushed the mutex_cleanup branch 2 times, most recently from e67ea4e to 334a863 Compare June 17, 2019 19:29
@nashif nashif removed the DNM This PR should not be merged (Do Not Merge) label Jun 18, 2019
kernel/queue.c Outdated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitipick: if the list is empty, the function could just return here?

Copy link
Contributor

@andrewboie andrewboie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can get rid of some of the checks in sys_mutex code since the base implementation is now doing it.

kernel/pipes.c Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you make it boolean operations ? e.g z_waitq_head(&pipe->wait_q.readers) != NULL

include/kernel.h Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why change it if this function always return 0 ? There is nothing to check here.

aescolar
aescolar previously approved these changes Dec 3, 2019
Copy link
Member

@aescolar aescolar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so, you are not approving?

I'm not the kernel maintainer, so it is for me to give a +2.
My concerns have been addressed. So this is just a +0.

Copy link
Member

@ioannisg ioannisg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some minor comments from my side

arch/Kconfig Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe some text in the help-text should be added, to justify the dependency

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

early return here, instead?

@andrewboie
Copy link
Contributor

My concerns have been addressed. So this is just a +0.

@aescolar this isn't helpful.
You obviously have strong opinions about this topic, you are allowed to give a +1

@aescolar
Copy link
Member

aescolar commented Dec 3, 2019

you are allowed to give a +1

@andrewboie Thanks. I understand I'm allowed to approve the PR. But the thing is that I do not support the overall direction things are going, but I will not oppose it either while the inconvenience for my company or others trying to target constrained platforms are not too great.

@ioannisg
Copy link
Member

ioannisg commented Dec 3, 2019

you are allowed to give a +1

@andrewboie Thanks. I understand I'm allowed to approve the PR. But the thing is that I do not support the overall direction things are going, but I will not oppose it either while the inconvenience for my company or others trying to target constrained platforms are not too great.

@aescolar @nashif as far I understood, it is possible for this patch to have no overhead, if the introduced Kconfig switches are set properly, i.e. disable both ASSERT and NO_RUNTIME error checking, am I correct?

@aescolar
Copy link
Member

aescolar commented Dec 3, 2019

@ioannisg My concern is two fold:

  • My approval of this PR may be miss-represented as support of further changes. And it is not clear to me what will happen after.
  • Already @nashif is implying that testing with anything else than RUNTIME_ERROR_CHECKS will not be done.

@andrewboie
Copy link
Contributor

But the thing is that I do not support the overall direction things are going

What? Be specific

@aescolar
Copy link
Member

aescolar commented Dec 3, 2019

What? Be specific

Disregard for platform constrained targets.
Over-prioritization of some assumed to be requirements of functional safety at the expense of any other target for the project.

@nashif
Copy link
Member Author

nashif commented Dec 3, 2019

Over-prioritization of some assumed to be requirements of functional safety at the expense of any other target for the project.

This has nothing to do with functional safety. The fact that we are not able to test basic APIs because we just crash is bad for EVERYONE.

@andrewboie
Copy link
Contributor

Disregard for platform constrained targets.

@aescolar let's explore this.
How does this PR disregard platform constrained targets?
Be specific.

@aescolar
Copy link
Member

aescolar commented Dec 3, 2019

@aescolar let's explore this.
How does this PR disregard platform constrained targets?

The fact that I removed my -1 would be an indication that it does not anymore. Wouldn't it?

@nashif nashif force-pushed the mutex_cleanup branch 3 times, most recently from 2a4b2ac to 4db0fdf Compare January 16, 2020 16:35
nashif added 11 commits January 20, 2020 10:38
Define there options for runtime error handling:
- assert on all errors (ASSERT_ON_ERRORS)
- no runtime checks (no asserts, no runtime error handling)
  (NO_RUNTIME_CHECKS)
- full runtime error handling (the default) (RUNTIME_ERROR_CHECKS)

Signed-off-by: Anas Nashif <[email protected]>
k_mutex_unlock will now perform error checking and return on failures.

If the current thread does not own the mutex, we will now return -EPERM.
In the unlikely situation where we own a lock and the lock count is
zero, we assert. This is considered an undefined bahviour and should not
happen.

Signed-off-by: Anas Nashif <[email protected]>
Check for errors at runtime and stop depending on ASSERTs.
This changes the API for
- k_sem_init

k_sem_init now returns -EINVAL on invalid data.

Signed-off-by: Anas Nashif <[email protected]>
Remove static helper functions used only once and integrate them into
calling functions.
In k_sem_take, return at the end.

Signed-off-by: Anas Nashif <[email protected]>
Add runtime error checking to k_pipe_cleanup and k_pipe_get and remove
asserts.
Adapted test which was expecting a fault to handle errors instead.

Signed-off-by: Anas Nashif <[email protected]>
When trying to cancel a NULL work queue return -EAGAIN.

Signed-off-by: Anas Nashif <[email protected]>
Add runtime error checking for k_mem_slab_init and replace asserts with
returning error codes.

Signed-off-by: Anas Nashif <[email protected]>
Add runtime error handling for k_msgq_cleanup. We return 0 on success
now and -EAGAIN when cleanup is not possible.

Signed-off-by: Anas Nashif <[email protected]>
Add runtime error checking for both k_stack_push and k_stack_cleanup.

Signed-off-by: Anas Nashif <[email protected]>
Runtime error handling for k_queue_append_list and k_queue_merge_slist.

Signed-off-by: Anas Nashif <[email protected]>
Add tests for k_sem_init.

Signed-off-by: Anas Nashif <[email protected]>
@nashif nashif merged commit 2711c4e into zephyrproject-rtos:master Jan 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: API Changes to public APIs area: Kconfig area: Kernel area: Tests Issues related to a particular existing or missing test

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants