Skip to content

Conversation

@pabigot
Copy link
Contributor

@pabigot pabigot commented Dec 30, 2018

This PR marks sys_dnode_t instances that are not in a list so that code does not depend on overloaded external state to assess this condition before potentially violating dlist API requirements.

A second commit uses that API to avoid a double-remove in timeout that could result in a corrupted list of timers, as an alternative solution to a commit in #12204 (for full context see this discussion if that doesn't disappear when the commit goes away.)

A third commit works around an unknown double-remove bug in the kernel poll infrastructure. Why the events are being removed multiple times should probably be investigated by a kernel maintainer.

Copy link
Contributor

@andyross andyross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally in favor, though it would be good to get a feel for how many list operations are being affected and what the performance and code size impact (sys_dlist_remove is generally inlined) is likely to be.

But yeah, we need to fix and not work around that spot in timer that's failing.

kernel/poll.c Outdated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good catch, and it's surprising that it didn't cause failures already (I guess the existing tests would all exercise the double remove only in circumstances where the list hadn't otherwise changed and the action would be idempotent). I wonder idly if it's not a better idea to put that test into sys_dlist_remove() as long as we're eating the extra cycles...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sys_dlist_remove already documents that the node must be part of the list, and in most situations that's true (and known at the call-site). So I don't think there's reason to add a check in that function.

@pabigot
Copy link
Contributor Author

pabigot commented Jan 3, 2019

Rebased on top of merged #12204, added commit to remove use of _INACTIVE as a flag value to detect timeouts that are not active.

Copy link
Contributor

@andyross andyross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One spot I think is wrong, a few other suggestions.

And I do still think we should pull up some size comparisons on stuff like tests/benchmarks/footprint to see how much this affects code size. I'd guess that this is a worthwhile tradeoff too, but it'd be nice to prove that.

kernel/sched.c Outdated
Copy link
Contributor

@andyross andyross Jan 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick on behalf of @ceolin : MISRA rules disallow implicit conversion to bool

But I think this is deeper. The old code would store the "was linked" state implicit in the timeout, but that's no longer ture, right? You don't know that a zero return here means "inactive", I think? Might need to be reworked to do a is_linked call first and then use that as the predicate for the if()?

Actually a quick grep shows that this one of two places actually using the return value of _abort_(thread_?)timeout(). Maybe the simpler thing to do would be to just remove that return value and/or swap it to a boolean that indicates "was active".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old code returned zero on successful abort, and _INACTIVE on failure. Returning -EINVAL is more consistent with indicating that the abort failed. I really think this is doing the right thing.

I'll change the test to detect a negative result, though, instead of implicit bool although there's an awful lot of implicit bool in Zephyr.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not particularly relevant, but this was always a bad API (a true return indicates a false state) that never got fixed up during all the refactoring. I wouldn't cry to see this patch change it to z_timeout_active() or whatever.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was partly because of the legacy of _EXPIRED: timeouts could be in three states, not two. This function tests that the state is exactly what _INACTIVE indicated.

A true result indicates that the predicate identified by the function name is true. Why is that a bad API? We use sys_dlist_is_empty(), not sys_dlist_is_nonempty().

The whole timeout API uses prefix underscore. While I'd rather that wasn't the case (violates C reserved identifiers for the non-static functions) I wouldn't want to change just one function, nor change all of them, in this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A true result indicates that the predicate identified by the function name is true

I don't think that not incorrect, exactly, but it's not innapropriate to argue against the ellision of unommitted negatives either.

No seriously, double negatives in API choices are bad, especially in predicates. The base adjective for that state is "active". True should mean "active", false should mean "inactive".

kernel/timeout.c Outdated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned elsewhere, probably best to just eliminate the return value here or make it a boolean to indicate "was active", as that's all existing callers care about.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return indicates the attempt to abort the timeout failed because the timeout wasn't active (valid for abort). I believe this is the preferred way to indicate the result of the operation.

@pabigot pabigot force-pushed the pr/20181230a branch 2 times, most recently from ef85219 to 7719457 Compare January 3, 2019 20:19
@pabigot
Copy link
Contributor Author

pabigot commented Jan 3, 2019

@andyross Thanks for the review; I've responded to some of the comments explaining the rationale, and made the one change for implicit bool. Let me know if I'm convincing about keeping the existing API.

BTW: With this patch set mps2_an385 intermittently fails tests/kernel/mem_slab/mslab_threadsafe/kernel.memory_slab due to an attempt to remove an already-removed dnode. I'm still looking for that.

@andrewboie
Copy link
Contributor

Good stuff here, although I share Andy's concerns about performance impact since dlists tend to be pretty hot code paths, perhaps consider ifdefing the extra checking around CONFIG_ASSERT or something like that?

@pabigot
Copy link
Contributor Author

pabigot commented Jan 3, 2019

There are two concerns: impact on code size from the added null assignments, and impact on performance of the null assignments (and, rarely, checks; most of the time the linked state does not need to be checked).

I can run comparisons for code size, if somebody tells me what platform(s) and what test application(s) to use. What are the criteria for acceptable/unacceptable impact of the change?

Performance is a different issue. First, I'd need instruction on how that would be evaluated in a meaningful way.

Second, the changes here have two effects: bugs where lists would be corrupted by double removes are converted from undetected corruption to faults at point-of-use; and a way to reliably identify nodes that are not in a list, as is needed with timers. The first is a quality issue, and we could certainly turn it off in some cases. The second is a functional capability, which would break if it were optional.

Note that the Linux dlist implementation invalidates the pointers when nodes are removed. They seem to believe any performance impact is negligible relative to the value of detecting failures.

What do you want me to do?

@andrewboie
Copy link
Contributor

Note that the Linux dlist implementation invalidates the pointers when nodes are removed. They seem to believe any performance impact is negligible relative to the value of detecting failures.

FWIW, Linux isn't targeting microcontrollers with memory on the order of 64K (or less)

way to reliably identify nodes that are not in a list, as is needed with timers.

Fair enough!

Copy link
Contributor

@andyross andyross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing -1, as I'm convinced the behavior looks correct. I'd still prefer to see _abort_timeout()'s signature matched to the needs to the needs of its actual callers instead of trying to return an error code (it's an internal function, not an API for users outside of the kernel -- hell, even within the kernel it's used only at the lowest level), but that can be fixed whenever.

@pabigot
Copy link
Contributor Author

pabigot commented Jan 3, 2019

@andrewboie @andyross Thanks for the approvals. Please take the following into account before merging:

The intermittent failure of mps2_an385 on tests/kernel/mem_slab/mslab_threadsafe/kernel.memory_slabs is due to MPU or BUS errors derefencing invalidated node link pointers during a call to _priq_dumb_remove.

By instrumenting the code I can tell that thread->base.qnode_dlist (which is actually used as dnode) has non-null next and prev pointers prior to invoking sys_dlist_remove but at least one null next or prev pointer within sys_dlist_remove. (Neither is a pointer-to-self, either.)

This appears to be a bug identified by this PR, not one introduced by this PR, although the cause is unclear. Unlike the poll bug no workaround seems to be effective. I don't know how to debug this further.

Copy link
Contributor

@andyross andyross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, wait. Sorry, I missed that warning earlier. Yeah, we for sure need to get any introduced failures addressed before this merges. If you want me to look at it, can you put together a reproducer and file it as a bug?

@pabigot
Copy link
Contributor Author

pabigot commented Jan 3, 2019

Your approvals came in while I was doing final edits on the warning; glad you caught it.

https://github.com/pabigot/zephyr/commits/diag/20181230a has a diagnostic commit on top of this PR. The commit message describes how to reproduce the problem. (The 10563 error code indicates the failing call came from line 563 of the instrumented sched.c under branch condition 10000.)

Thanks for looking at this.

@andyross
Copy link
Contributor

andyross commented Jan 4, 2019

Spent a bit on this. Turns out you can make it happen just by putting the test suite in a while(1), which is helpful. It happens less if you disable memory protection (not just userspace, interestingly) but doesn't disappear.

Some work wrapping functions with macros so I can use LINE to get the caller (i.e. the poor man's stack trace) showed that the specific failure you're seeing is in the timeslicing code that is trying to move the active thread to the end of the run queue (which explains why this is the test that fails: it's somewhat inexplicably configured to timeslice at 1kHz!). But _current has nulls in the list entries when it SHOULD be in the run queue! Backing out from that to the very top of the timer ISR and I see the same thing, timeslice expiration or no (which makes the bug much easier to hit).

So something is either interrupting a spot where the list is in flux (which shouldn't happen, and AFAICT isn't) or something is scribbling on the list node, which just happens to live at the very beginning of the thread object... So... I added some dummy buffer fields in the thread struct in front of the list node to try to catch the overrun, and the system doesn't boot.

So at this point I'm reasonably certain there's something in the ARM code that isn't properly respecting the thread layout and writing over its first 8 bytes with nulls. Probably would be easy enough to find with a debugger and some scripted watchpoints, but I'm useless with tooling. Will play more tomorrow.

@pabigot
Copy link
Contributor Author

pabigot commented Jan 4, 2019

@andyross Thanks for looking at this. Your description provided some helpful clues.

I was able to replicate the observation of _current having null links in the context of the timer ISR. In my case it occurred because the main thread had exited (removing itself from all run queues) but was still recorded as the value of _current. The tick ISR was still active and when it fired tried moved _current to the back of the queue, which caused havoc since it wasn't in a queue at that point. There might be a better fix, assuming this situation can ever occur in an unhacked kernel; nonetheless I resolved it by changing sliceable to ignore unrunnable threads.

After that red herring was properly tinned I finally found the real problem: mem_slab retains a queue of threads that want to allocate from it that should be scheduled on memory release, but the operation to pull threads from a wait queue didn't handle the case where the queue was empty, at which point it queued garbage. That's also been fixed.

Finally, I corrected the name of the link field in _thread_base so we no longer appear to be appending lists to lists.

The problematic test runs reliably now.

(Actually, as I think about it, the original CONTAINER_OF(NULL, struct, field) from an empty queue would have returned a null pointer and shouldn't have caused problems without the addition of padding to tje start of _thread_base, which instead made it return 0xfffffffc. So the test failure might really have been related to the first problem of switching to a thread that was blocked on the slab wait queue, if those threads were detected as sliceable. If so, maybe the behavior of _pend_current_thread needs to be examined.

Whatever, it works now)

@pabigot
Copy link
Contributor Author

pabigot commented Jan 4, 2019

My last patch set solved the problem, by breaking timeslicing. The proper explanation and solution is in this commit:

kernel: sched: inhibit timeslice from unqueued thread

If timeslicing is enabled z_clock_announce will attempt to swap the
current thread for another. The implementation of z_time_slice
assumes that _current is always in the queue, and so moves it, which
relies on valid links in the dnode structure.

When the current thread is in the process of pending itself the thread
may have already been removed from the queue at the point the tick ISR
runs. This violates the requirement in _move_thread_to_end_of_prio_q
that the thread already be in the queue.

If the current thread is not queued treat it as unsliceable: it's in the
process of being swapped anyway.

I'm leaving this one in place for now:

kernel: sched: avoid potential null pointer dereference in wait queue

The existing code has a too-subtle reliance on the sys_dnode_t instance
being located at the start of its container: this is the only situation
where CONTAINER_OF(NULL, struct, tag) will be guaranteed to produce a
null pointer.

When this required layout is violated, e.g. during diagnostics, the
system fails horribly: for example an empty wait queue will result in an
attempt to operate on a thread that does not exist.

Update all _priq_foo_best implementations to explicitly check for a null
pointer rather than relying on the dnode being first in the thread
structure.

because the original code was way too subtle, wasting time for both @andyross and me. However, the checks cost 12 bytes to add under SDK 0.9.5, and 16 bytes under gcc 8.2.1, so you can tell me to remove it.

@carlescufi
Copy link
Member

@pabigot please rebase

@carlescufi
Copy link
Member

@andyross if you are happy with this please approve so we can move this forward

@pabigot pabigot removed the DNM This PR should not be merged (Do Not Merge) label Jan 15, 2019
@pabigot pabigot changed the title [DNM] extend dlist to detect unlinked nodes and avoid double-remove bugs extend dlist to detect unlinked nodes and avoid double-remove bugs Jan 15, 2019
@pabigot
Copy link
Contributor Author

pabigot commented Jan 15, 2019

@andrewboie @andyross in the churn of this PR the following additional commit got added after the original review process. If this is OK I'd like to leave it, if not I can pull it out and resubmit it in another PR that's blocked by the dlist features.

kernel: timeout: detect inactive timeouts using dnode linked state

Whether a timeout is linked into the timeout queue can be determined
from the corresponding sys_dnode_t linked state. This removes the need
to use a special flag value in dticks to determine that the timeout is
inactive.

Update _abort_timeout to return an error code, rather than the flag
value, when the timeout to be aborted was not active.

Remove the _INACTIVE flag value, and replace its external uses with an
internal API function that checks whether a timeout is inactive.

@pabigot
Copy link
Contributor Author

pabigot commented Jan 15, 2019

@andrewboie @andyross @carlescufi Actually, hold on: to make things easier I'm going to close this one, and submit the one that has the combined dlist and timer updates for #12332 together.

@pabigot
Copy link
Contributor Author

pabigot commented Jan 15, 2019

Reopening since Andy doesn't like the part of #12485 that motivated putting it in a separate PR. So this is back to being the original content, plus the removal of the magic _INACTIVE flag value.

@pabigot pabigot reopened this Jan 15, 2019
Although sys_dnode_t and sys_dlist_t are aliases, their roles are
different and they appear in different positions in dlist API calls.

Signed-off-by: Peter A. Bigot <[email protected]>
CONTAINER_OF() on a NULL pointer returns some offset around NULL and not
another NULL pointer.  We have to check for that ourselves.

This only worked because the dnode happened to be at the start of the
struct.

Signed-off-by: Peter A. Bigot <[email protected]>
The original implementation allows a list to be corrupted by list
operations on the removed node.  Existing code attempts to avoid this by
using external state to determine whether a node is in a list, but this
is fragile and fails when the state that holds the flag value is changed
after the node is removed, e.g. in preparation for re-using the node.

Follow Linux in invalidating the link pointers in a removed node.  Add
API so that detection of particpation in a list is available at the node
abstraction.

This solution relies on the following steady-state invariants:
* A node (as opposed to a list) will never be adjacent to itself in a
  list;
* The next and prev pointers of a node are always either both null or
  both non-null.

Signed-off-by: Peter A. Bigot <[email protected]>
Use the new generic capability to detect unlinked sys_dnode_t instances.

Signed-off-by: Peter A. Bigot <[email protected]>
k_poll events are registered in a linked list when their signal
condition has been met.  The code to clear event registration did not
account for events that were not registered, resulting in double-removes
that produced core dumps on native-posix sanitycheck.

Signed-off-by: Peter A. Bigot <[email protected]>
Whether a timeout is linked into the timeout queue can be determined
from the corresponding sys_dnode_t linked state.  This removes the need
to use a special flag value in dticks to determine that the timeout is
inactive.

Update _abort_timeout to return an error code, rather than the flag
value, when the timeout to be aborted was not active.

Remove the _INACTIVE flag value, and replace its external uses with an
internal API function that checks whether a timeout is inactive.

Signed-off-by: Peter A. Bigot <[email protected]>
@pabigot
Copy link
Contributor Author

pabigot commented Jan 15, 2019

@andyross Reverted the change of the structure tag per #12485 (comment)

@pabigot
Copy link
Contributor Author

pabigot commented Jan 18, 2019

@andyross this is back to the original submitted content, plus the removal of the inactive magic value and restoring the qnode_dlist spelling. Is it OK now?

@carlescufi carlescufi added the dev-review To be discussed in dev-review meeting label Jan 22, 2019
@carlescufi
Copy link
Member

@andyross can you please review the last commit in this series?

@carlescufi carlescufi merged commit b4ece0a into zephyrproject-rtos:master Jan 23, 2019
@carlescufi
Copy link
Member

Thanks @andyross

@pabigot pabigot deleted the pr/20181230a branch January 23, 2019 21:04
@pabigot pabigot removed the dev-review To be discussed in dev-review meeting label Jan 24, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants