Checking the stack sentinel may abort the current thread,
make this check before we determine what the next thread
to run is.
Fixes: #15037
Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
Controlling expression of if and iteration statements must have a
boolean type.
MISRA-C rule 14.4
Signed-off-by: Flavio Ceolin <flavio.ceolin@intel.com>
BIT macro uses an unsigned int avoiding implementation-defiend behavior
when shifting signed types.
MISRA-C rule 10.1
Signed-off-by: Flavio Ceolin <flavio.ceolin@intel.com>
On SMP, there was a bug where the logic that re-adds _current to the
run queue at swap time would accidentally reschedule threads that had
just gone to sleep, because the is_thread_prevented_from_running()
predicate only tests for threads that are "suspended" or "pending" and
not sleeping.
Overload _THREAD_SUSPENDED to indicate "sleeping" also. Simple fix
for an immediate bug, though long term we really want to unify all the
blocked conditions to prevent this kind of state bug.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
Daniel Leung caught a good one: In the (SMP) case where we were
aborting a thread that was not currently scheduled, we were flagging
the DEAD state on _current and not the thread we were aborting! This
wasn't as fatal as it seems, as the thread that called z_sched_abort()
would effectively go on living (as a zombie?) in a state where it
would always be preempted, but would otherwise remain scheduleable.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
The workaround for nonatomic swap had yet another edge case: it would
save off the _current pointer when pending a thread so that the next
time slice interrupt could test it to see if the swap had actually
happened before assuming that _current could be rescheduled (if it
just pended itself, that's impossible). Then it would clear the
pending_current pointer so future interrupts wouldn't be confused.
BUT: it turns out that qemu, when faced with really rapid timer rates
that exceed its (host-based) timing accuracy, is perfectly willing to
"stack up" timer interrupts such the one goes pending before the
previous one is finished executing. In that case, we can enter the
SECOND timer interrupt, to try timeslicing a SECOND time, STILL before
the PendSV exception has run to actually effect the context switch.
Except this time pending_current has been cleared and we try to
reschedule the pended _current thread incorrectly. In theory real
hardware could do this too, though it would involve absolutely crazy
interrupt latency problems.
Work around this by moving the clear to the thread itself, immediately
after it wakes up from the pend call it retakes a lock and clears
pending_current if it still matches _current. That is not a perfect
fix: there remains a 2-3 instruction race at that moment where we
return from pend and before we can lock interrupts again where a timer
interrupt will see an incorrect pointer. But I hammered at this and
couldn't make qemu do that (i.e. return from a timer interrupt but
flag a new one in just a cycle or two).
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
This was always doing a remove/add of the _current thread to the run
queue, which is wrong because in SMP _current isn't in the queue to
remove. But it went undetected until the recent dlist changes.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
In SMP, we are setting the _current pointer while holding the
scheduler spinlock locally, which means that when we try to release it
the validation layer (not the spinlock per se) will scream at us
because the thread that took the lock doesn't match the one releasing
it.
Special case this when validation is enabled.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
The tracing fixes in commit e87193896a ("subsys: debug: tracing: Fix
thread tracing") were... not a readability win. The point appears to
have been to put a tracing hook immediately before and after the
assignment to the _current pointer. So do that in an abstracted
function and clean up _get_next_switch_handle() (which is a subtle and
important function already polluted with some unavoidable preprocessor
testing!)
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
Currently thread abort doesn't work if a thread is currently scheduled
on a different CPU, because we have no way of delivering an interrupt
to the other CPU to force the issue. This patch adds a simple
framework for an architecture to provide such an IPI, implements it
for x86_64, and uses it to implement a spin loop in abort for the case
where a thread is currently scheduled elsewhere.
On SMP architectures (xtensa) where no such IPI is implemented, we
fall back to waiting on an arbitrary interrupt to occur. This "works"
for typical code (and all current tests), but of course it cannot be
guaranteed on such an architecture that k_thread_abort() will return
in finite time (e.g. the other thread on the other CPU might have
taken a spinlock and entered an infinite loop, so it will never
receive an interrupt to terminate itself)!
On non-SMP architectures this patch changes no code paths at all.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
In SMP, _current is not "queued". (The run queue only stores
unscheduled threads because we can't rely on the head of the list
being _current). We weren't updating the cache choice, which would
flag swap_ok, so calling k_thread_abort(_current) (for example, when a
thread exits from its entry function) would try to switch back into
the thread and then run off the end of the function.
Amusingly this was more benign than you'd think. Stumbled on it by
accident.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
Update reserved function names starting with one underscore, replacing
them as follows:
'_k_' with 'z_'
'_K_' with 'Z_'
'_handler_' with 'z_handl_'
'_Cstart' with 'z_cstart'
'_Swap' with 'z_swap'
This renaming is done on both global and those static function names
in kernel/include and include/. Other static function names in kernel/
are renamed by removing the leading underscore. Other function names
not starting with any prefix listed above are renamed starting with
a 'z_' or 'Z_' prefix.
Function names starting with two or three leading underscores are not
automatcally renamed since these names will collide with the variants
with two or three leading underscores.
Various generator scripts have also been updated as well as perf,
linker and usb files. These are
drivers/serial/uart_handlers.c
include/linker/kobject-text.ld
kernel/include/syscall_handler.h
scripts/gen_kobject_list.py
scripts/gen_syscall_header.py
Signed-off-by: Patrik Flykt <patrik.flykt@intel.com>
Rename scheduler spinlock sched_lock to sched_spinlock as it will
collide with the cleanup of the reserved function name _sched_lock(),
which will also be called sched_lock().
Signed-off-by: Patrik Flykt <patrik.flykt@intel.com>
Nonatomic swap strikes again. These issues are all longstanding, but
were unmasked by the dlist work in commit d40b8ce1fb ("sys: dlist:
Add sys_dnode_is_linked") where list node pointers become nulls on
removal.
The previous fix was for a specific case where a timeslicing interrupt
would try to slice out the "wrong" current thread because the thread
has "just" pended itself. That was incomplete, because the parallel
code in k_sleep() didn't flag itself the same way.
And beyond that, it turns out to be basically impossible (now that I'm
thinking about it correctly) to prevent interrupt code from calling
into the scheduler to suspend a "just pended but not quite" current
and/or preempt away to another thread. In any of these cases, the
scheduler modifications to the state bits remain correct but the queue
nodes may be corrupt because the thread was already removed from the
ready queue. So we have to test and correct this at the lowest level,
where a thread is being removed from a priq: check that it's (1) the
ready queue and not a waitq, (2) the current thread, and (3) already
marked suspended and thus not in the queue.
There are lots of existing issues filed in the last few months all
pointing to odd instability on ARM platforms. I'm reasonably certain
this is the root cause for most or all of them.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
This API doesn't use the normal thread priority comparison itself, so
doesn't get the magic that thread_base.prio provides. If called when
another thread should be run, this would preempt the current thread
always, even if the scheduler lock was taken.
That was benign until recent spinlockifiation exposed it: a mutex in
the philosophers test run in preempt_only mode would swap away while
holding a spinlock (which used to work with irq locks) and fail later
with a "recursive" spinlock assert.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
The k_sleep() locking was actually to protect the _current state from
preemption before the context switch, so document that and replace
with a spinlock. Should probably unify this with the rather cleaner
logic in pend_curr(), but right now "sleeping" and "pended" are
needlessly distinct states.
And we can remove the locking entirely from k_wakeup(). There's no
reason for any of that to need to be synchronized. Even if we're
racing with other thread modifiations, the state on exit will be a
runnable thread without a timeout, or whatever timeout/pend state the
other side was requesting (i.e. it's a bug, but not one solved by
synhronization).
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
These functions, for good design reason, take a locking key to
atomically release along with the context swtich. But there's still a
common pattern in code to do a switch unconditionally by passing
irq_lock() directly. On SMP that's a little hurtful as it spams the
global lock. Provide an _unlocked() variant for
_Swap/_reschedule/_pend_curr for simplicity and efficiency.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
Just like with _Swap(), we need two variants of these utilities which
can atomically release a lock and context switch. The naming shifts
(for byte count reasons) to _reschedule/_pend_curr, and both have an
_irqlock variant which takes the traditional locking.
Just refactoring. No logic changes.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
We want a _Swap() variant that can atomically release/restore a
spinlock state in addition to the legacy irqlock. The function as it
was is now named "_Swap_irqlock()", while _Swap() now refers to a
spinlock and takes two arguments. The former will be going away once
existing users (not that many! Swap() is an internal API, and the
long port away from legacy irqlocking is going to be happening mostly
in drivers) are ported to spinlocks.
Obviously on uniprocessor setups, these produce identical code. But
SMP requires that the correct API be used to maintain the global lock.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
This adds a simple implementation of SMP CPU affinity to Zephyr. The
API is simple and doesn't try to invent abstractions like "cpu sets".
Each thread has an enable/disable flag associated with each CPU in the
system, and the bits can be turned on and off (for threads that are
not currently runnable, of course) using an easy three-function API.
Because the implementation picked requires enumerating runnable
threads in priority order looking for one that match the current CPU,
this is not a good fit for the SCALABLE or MULTIQ scheduler backends,
so it currently can be enabled only for SCHED_DUMB (which is the
default anyway). Fancier algorithms do exist, but even the best of
them scale as O(N_CPUS), so aren't quite constant time and often
require significant memory overhead to keep separate lists for
different cpus/sets.
The intended use here is for apps that want to "pin" threads to
specific CPUs for latency control, or conversely to prevent certain
threads from taking time on specific CPUs to leave them free for fast
response.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
Idle threads must (for obvious reasons!) always be preemptible from
the perspective of the scheduler. But when preemptive scheduling is
disabled, they are given a priority of -1, which is the lowest
COOPERATIVE priority. So the scheduler preemption logic needed an
extra test for this case and couldn't just rely on the existing
priority comparison. This was a measurable performance loss, as this
is a hot path on existing benchmarks.
Limit that test to circumstances (!CONFIG_PREEMPT_ENABLED) where it's
actually needed.
Longer term it would be better to just force the existence of one
"preemptible" thread priority always, but right now the number of
priorities and the state of the PREEMPT_ENABLED kconfig flag are
linked, and the existing interrupt return code (with no preemption,
you know with certainty which thread you are returning to and can skip
some work) on some platforms fails when I try this.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
For historical reasons, some architectures had a valid _current thread
pointer at initialization time and others didn't. So the scheduler
logic had a test that checks _current vs. NULL every time it needed to
check premption, when this was only a workaround for initialization
state.
Fix things so that there is a dummy thread always (and clean up the
code to do a struct assignment instead of a memset of bare memory),
and we can remove that test from the scheduler hot path.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
GCC 6.2.0 is making frustratingly poor inlining decisions with some of
these routines, resulting in an awful lot of runtime calls for code
that is only ever expanded once or twice within the file.
Treat with targetted ALWAYS_INLINE's to force the issue. The
scheduler code is a hot path.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
The sys_dlist_insert_*() functions had a behavior where a NULL
argument for the insertion position to sys_dlist_insert_after/before()
was interpreted as "the end of the list". We never used that
convention (except in one spot internal to dlist.h which was not
itself used anywhere), and of course already have an API for appending
and prepending to a list.
In practice this was a performance disaster. The NULL check is
virtually never provable statically by the compiler, so that test and
branch is present always. And worse, the check and call to another
function was pushing this beyond the complexity limit for gcc to
inline a function (at -Os optimization anyway), forcing us to use
function calls for what should be a ~8 instruction sequence. The
upshot is that dlist insertions were 2-3x slower than they needed to
be.
Deprecate these older APIs and introduce a new sys_dlist_insert() call
which can be much better optimized.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
Whether a timeout is linked into the timeout queue can be determined
from the corresponding sys_dnode_t linked state. This removes the need
to use a special flag value in dticks to determine that the timeout is
inactive.
Update _abort_timeout to return an error code, rather than the flag
value, when the timeout to be aborted was not active.
Remove the _INACTIVE flag value, and replace its external uses with an
internal API function that checks whether a timeout is inactive.
Signed-off-by: Peter A. Bigot <pab@pabigot.com>
CONTAINER_OF() on a NULL pointer returns some offset around NULL and not
another NULL pointer. We have to check for that ourselves.
This only worked because the dnode happened to be at the start of the
struct.
Signed-off-by: Peter A. Bigot <pab@pabigot.com>
Timeslicing works by removing the _current thread from the run queue
and re-adding it at the end of its priority. On systems with a
_Swap() that can be preempted by a timer interrupt, that means it's
possible for the timeslice to try to slice out a thread that had
already pended itself!
This behavior used to be benign (or at least undetectable) as the
duplicated list operations were idempotent. But now the dlist code is
stricter about correctness and has exposed the bug -- it will blow up
if you try to remove an already-removed list node.
Fix (on affected platforms) by stashing the _current pointer in
_pend_current_thread() that is checked and cleared in the timer
interrupt. If we discover we're trying to interrupt a thread that's
already interrupted itself, we can safely exit z_time_slice() as a
noop. The timeslicing bookeeping was already done for us underneath
the pend code.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
This is a refactoring of the fix in commit 6c95dafd82 to limit its
application to affected platforms now that the root cause is
understood.
Note that the bug that fix was addressing was rare and seen only on
after multi-hour sessions on Michael Scott's test rig. So if
something regresses, this is where to look!
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
The recent change that added a locked z_set_timeout_expiry() API
obsoleted the subtle note about synchronization above
reset_time_slice(). None of that matters any more, the API is
synchronized internally in a conventional way.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
MISRA-C says all declarations of an object or function must use the
same name and qualifiers.
MISRA-C rule 8.3
Signed-off-by: Flavio Ceolin <flavio.ceolin@intel.com>
The order of evaluation of function calls in the arguments of a
function. This is undefined (32)/ unspecified(15-18) in C99.
MISRA-C rule 13.2 does not allow that a value of an expression and its
side effects happens in not deterministic order to avoid these
undefined behaviors.
Signed-off-by: Flavio Ceolin <flavio.ceolin@intel.com>
System must not set the clock expiry via backdoor as it may
effect in unbound time drift of all scheduled timeouts.
Fixes: #11502
Signed-off-by: Pawel Dunaj <pawel.dunaj@nordicsemi.no>
The call to z_clock_set_timeout() was being made outside the timeout
lock, which can race against other contexts setting sooner-expiring
timeouts.
Also add a long comment to one spot (timeslicing) where this call is
made outside the timeout spinlock (inside the scheduler lock) and why
this is OK.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
If this function is itself interrupted by a timeslice event, the
slicing state can be corrupted. Just re-use the scheduler lock
instead of using a new spinlock; this is a low-latency function that
won't deadlock. Found by inspection.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
There was an struct and a variable called _kernel. This is error prone
and a MISRA-C violation. It is changing the struct to have a unique
identifier.
MISRA-C rule 5.8
Signed-off-by: Flavio Ceolin <flavio.ceolin@intel.com>
This commit introduces k_sleep() return value, which provides
information about actual sleep time. If the returned value is
not-zero, the thread slept shorter than requested, which is
only possible if the thread has been woken up by k_wakeup() call.
Signed-off-by: Piotr Zięcik <piotr.ziecik@nordicsemi.no>
In _pend_current_thread the argument key is always a unsigned
interger type and this function forces it to become a signed
interger. This is a dangerous behavior and cant be trusted to
work as expected.
Signed-off-by: Adithya Baglody <adithya.nagaraj.baglody@intel.com>
This API shouldn't take a int type but instead it should take
u32_t. This argument has to be similar to irq_lock() and
irq_unlock().
Signed-off-by: Adithya Baglody <adithya.nagaraj.baglody@intel.com>
In tickless mode, not all elapsed ticks may have been announced yet,
so future z_time_slice() calls will include "extra" ticks that we have
to account for when setting up the slice count.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
It's possible to interrupt a thread that has already scheduled a
timeout. Really this is a race against the usage of
_add_thread_timeout() and needs some design work to provide proper
locking (which is a distinct requirement from the scheduler lock and
timeout lock!), as the users of that API are spread around the kernel.
But existing usage always schedules the timeouts first, so this is
safe.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
The timeout APIs are properly synchronized now. This irq_lock() (and
the comment explaining it) isn't needed anymore.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
Now that the API has been fixed up, replace the existing timeout queue
with a much smaller version. The basic algorithm is unchanged:
timeouts are stored in a sorted dlist with each node nolding a delta
time from the previous node in the list; the announce call just walks
this list pulling off the heads as needed. Advantages:
* Properly spinlocked and SMP-aware. The earlier timer implementation
relied on only CPU 0 doing timeout work, and on an irq_lock() being
taken before entry (something that was violated in a few spots).
Now any CPU can wake up for an event (or all of them) and everything
works correctly.
* The *_thread_timeout() API is now expressible as a clean wrapping
(just one liners) around the lower-level interface based on function
pointer callbacks. As a result the timeout objects no longer need
to store backpointers to the thread and wait_q and have shrunk by
33%.
* MUCH smaller, to the tune of hundreds of lines of code removed.
* Future proof, in that all operations on the queue are now fronted by
just two entry points (_add_timeout() and z_clock_announce()) which
can easily be augmented with fancier data structures.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
Now that this is known to be an unused value, remove it from the API.
Note that this caught a few spots where we were passing values (a
non-NULL wait_q with a NULL thread handle) that were always being
ignored before.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>