The only user of arch_mem_domain_destroy was the deprecated
k_mem_domain_destroy function which has now been removed. So remove
arch_mem_domain_destroy as well.
Signed-off-by: Kumar Gala <kumar.gala@linaro.org>
Remove k_mem_domain_destroy and k_mem_domain_remove_thread as they've
been deprecated for at least 2 releases now.
Signed-off-by: Kumar Gala <kumar.gala@linaro.org>
The internal function z_smp_reacquire_global_lock() has not used by
anywhere inside zephyr code, so remove it.
Fixes#33273.
Signed-off-by: Enjia Mai <enjiax.mai@intel.com>
The static device dependencies from devicetree are not the only ones
that might be present at runtime. Add API that allows visiting
required devices without assuming that handles for or pointers to them
can be accessed as a static contiguous sequence.
Signed-off-by: Peter Bigot <peter.bigot@nordicsemi.no>
Wrap arch_sched_ipi() call in z_thread_abort() with ifdef checking for
hardware support of IPI.
Fixes#32723
Signed-off-by: Lauren Murphy <lauren.murphy@intel.com>
Previously, a racing write to the provided string could result
in up to CONFIG_THREAD_MAX_NAME_LEN-2 bytes after the end
of user-accessible memory being leaked into the thread name.
For now, make a temporary copy. In an ideal world this could
copy directly from userspace into the thread name, but that
violates the current vrfy / impl split.
Signed-off-by: James Harris <james.harris@intel.com>
The xtensa atomics layer was written with hand-coded assembly that had
to be called as functions. That's needlessly slow, given that the low
level primitives are a two-instruction sequence. Ideally the compiler
should see this as an inline to permit it to better optimize around
the needed barriers.
There was also a bug with the atomic_cas function, which had a loop
internally instead of returning the old value synchronously on a
failed swap. That's benign right now because our existing spin lock
does nothing but retry it in a tight loop anyway, but it's incorrect
per spec and would have caused a contention hang with more elaborate
algorithms (for example a spinlock with backoff semantics).
Remove the old implementation and replace with a much smaller inline C
one based on just two assembly primitives.
This patch also contains a little bit of refactoring to address the
scheme has been split out into a separate header for each, and the
ATOMIC_OPERATIONS_CUSTOM kconfig has been renamed to
ATOMIC_OPERATIONS_ARCH to better capture what it means.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
Commit 6b84ab3830 ("kernel/sched: Adjust locking in z_swap()") moved
the call to arch_cohere_stacks() out of the scheduler lock while doing
some reorgnizing. On further reflection, this is incorrect. When
done outside the lock, the two arch_cohere_stacks() calls will race
against each other.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
Zephyr docs state that timers will act as one-shot timers when started
with a period of K_NO_WAIT or K_FOREVER. However the code adjusting
period was setting K_FOREVER timeout ticks to 1 which caused the timer
to expire every tick. This adds a check to not adjust K_FOREVER periods
Signed-off-by: Eric Johnson <eric@liveathos.com>
pm_system_resume is always implemented when PM is enabled. There is no
need to have this weak function under an ifdef PM.
Signed-off-by: Flavio Ceolin <flavio.ceolin@intel.com>
pm_system_resume_from_deep_sleep is not implemented or used
anywhere. Just remove it and keep the code base cleaner.
Signed-off-by: Flavio Ceolin <flavio.ceolin@intel.com>
This function is useless and the state variable that it was
controlling is also not necessary because the same logic is being
handled by the variable post_ops_done.\
This reasonably simplifies idle thread logic.
Signed-off-by: Flavio Ceolin <flavio.ceolin@intel.com>
pm_system_suspend is called only from the idle thread and should
not be exported as a public API.
Signed-off-by: Flavio Ceolin <flavio.ceolin@intel.com>
Previously, a k_sem_reset with any outstanding waiting threads would
result in the semaphore in an inconsistent state, with more threads
waiting in the wait_q than the count would indicate.
Explicitly -EAGAIN any waiting threads upon k_sem_reset, to
ensure safety here.
Signed-off-by: James Harris <james.harris@intel.com>
Currently there is no way to distinguish between a caller
explicitly asking for a semaphore with a limit that
happens to be `UINT_MAX` and a semaphore that just
has a limit "as large as possible".
Add `K_SEM_MAX_LIMIT`, currently defined to `UINT_MAX`, and akin
to `K_FOREVER` versus just passing some very large wait time.
In addition, the `k_sem_*` APIs were type-confused, where
the internal data structure was `uint32_t`, but the APIs took
and returned `unsigned int`. This changes the underlying data
structure to also use `unsigned int`, as changing the APIs
would be a (potentially) breaking change.
These changes are backwards-compatible, but it is strongly suggested
to take a quick scan for `k_sem_init` and `K_SEM_DEFINE` calls with
`UINT_MAX` (or `UINT32_MAX`) and replace them with `K_SEM_MAX_LIMIT`
where appropriate.
Signed-off-by: James Harris <james.harris@intel.com>
Due to the recent changes to scheduler z_find_first_thread_to_unpend
& z_remove_thread_from_ready_q are not used anymore. So removing the
dead code.
fixes: #32691
Signed-off-by: Spoorthy Priya Yerabolu <spoorthy.priya.yerabolu@intel.com>
While I'm in the idle code, let's clean this loop up. It was a really
bad #ifdef hell:
* Remove the CONFIG_TICKLESS_IDLE_THRESH logic (and the kconfig),
which never did anything but needlessly increase latency.
* Move the needed timeout logic from the main loop into
pm_save_idle(), which eliminates the special case for
!SYS_CLOCK_EXISTS.
Behavior (modulo that one kconfig) should be completely unchanged, and
now the inner part of the idle loop looks like:
while (true) {
(void) arch_irq_lock();
if (IS_ENABLED(CONFIG_PM)) {
pm_save_idle();
} else {
k_cpu_idle();
}
}
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
The removal of the abort handling also absconded with an IRQ lock that
is required for reliable operation in the idle loop. Put it back.
Once the idle loop has made a decision to enter idle, any interrupt
that arrives needs to be masked and delivered AFTER the system enters
idle. Otherwise we run the risk of races where the system accepts and
processes an interrupt that should have prevented idle, but then goes
to sleep anyway having already made the decision.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
Now that the old API has been reimplemented with the new API remove
the old implementation and its tests.
Signed-off-by: Peter Bigot <peter.bigot@nordicsemi.no>
Switch the default and clean up some test workarounds. This will enable
final conversions necessary to transition to the new API.
Signed-off-by: Peter Bigot <peter.bigot@nordicsemi.no>
This commit provides a complete reimplementation of the work queue
infrastructure intended to eliminate the race conditions and feature
gaps in the existing implementation.
Both bare and delayable work structures are supported. Items can be
submitted; delayable items can be scheduled for submission at a future
time. Items can be delayed, queued, and running all at the same time.
A running item can also be canceling.
The new implementation:
* replaces "pending" with "busy" which identifies the active states;
* supports canceling delayed and submitted items;
* prevents resubmission of a item being canceled until cancellation
completes;
* supports waiting for cancellation to complete;
* supports flushing a work item (waiting for the last submission to
complete without preventing resubmission);
* supports waiting for a queue to drain (only allows resubmission from
the work thread);
* supports stopping a work queue in conjunction with draining it;
* prevents handler-reentrancy during resubmission.
Signed-off-by: Peter Bigot <peter.bigot@nordicsemi.no>
Attempts to reimplement the existing work API using a new work
implementation failed, primarily due to heavy use of whitebox testing
in validating the original API. Add a temporary Kconfig that will
select between the two implementations so we can use the same
identifiers but select which implementation they reference.
This commit just adds the selection infrastructure and uses it to
conditionalize the existing implementation in anticipation of the new
one in the next commit.
Signed-off-by: Peter Bigot <peter.bigot@nordicsemi.no>
These functions are a subset of proposed public APIs to clean up
several issues related to safely handling waking of threads. They
have been made private as they interface may change, but their use
will simplify the reimplementation of the k_work functionality.
See: https://github.com/zephyrproject-rtos/zephyr/pull/29668
Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
Signed-off-by: Peter Bigot <peter.bigot@nordicsemi.no>
Several internal APIs wrote thread attributes (return value, mainly)
_after_ calling `z_ready_thread`. This is unsafe, at least in SMP,
because another core could have already picked up and run the thread.
Fixes#32800.
Signed-off-by: James Harris <james.harris@intel.com>
`z_impl_k_yield` unlocked sched_spinlock, only to lock it again
immediately, do a little bit more work, then unlock it again.
This causes performance issues on SMP, where `sched_spinlock`
is often fairly highly contended and cores often end up spinning
for quite a while waiting to retake the lock in `z_swap_unlocked`.
Instead directly pass the spinlock key to `z_swap` and avoid the
extra lock+unlock.
Signed-off-by: James Harris <james.harris@intel.com>
`z_is_t1_higher_prio_than_t2` was being called twice in both the
context-switch fastpath and in `z_priq_rb_lessthan`, just to
dealing with priority ties. In addition, the API was error-prone
(and too much in the fastpath to be able to assert its invarients)
- see also #32710 for a previous example of this API breaking
and returning a>b but also b>a.
Replacing this with a direct 3-way comparison `z_cmp_t1_prio_with_t2`
sidesteps most of these issues. There is still a concern that
`sgn(z_cmp_t1_prio_with_t2(a,b)) != -sgn(z_cmp_t1_prio_with_t2(b,a))`
but I don't see any way to alleviate this aside from adding an
assert to the fastpath.
Signed-off-by: James Harris <james.harris@intel.com>
Previously two tasks with the same deadline and priority would
always have `z_is_t1_higher_prio_than_t2` `true` in both directions.
This is logically inconsistent, and results in `k_yield` not actually
yielding between identical threads.
Signed-off-by: James Harris <james.harris@intel.com>
Add a newer, much smaller and simpler implementation of abort and
join. No need to involve the idle thread. No need for a special code
path for self-abort. Joining a thread and waiting for an aborting one
to terminate elsewhere share an implementation. All work in both
calls happens under a single locked path with no unexpected
synchronization points.
This fixes a bug with the current implementation where the action of
z_sched_single_abort() was nonatomic, releasing the lock internally at
a point where the thread to be aborted could self-abort and confuse
the state such that it failed to abort at all.
Note that the arm32 and native_posix architectures, which have their
own thread abort implementations, now see a much simplified
"z_thread_abort()" internal API.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
THIS COMMIT DELIBERATELY BREAKS BISECTABILITY FOR EASE OF REVIEW.
SKIP IF YOU LAND HERE.
Remove the existing implementatoin of k_thread_abort(),
k_thread_join(), and the attendant facilities in the thread subsystem
and idle thread that support them.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
This function would correctly suppress attempts to set timeouts that
were too soon for the driver or farther out than what was already set,
but when it actually set the timeout it would use the requested value
and not clamp it to the minimum of it and the current timeout
expiration, leading to "too-long" timeouts being set at the driver.
In uniprocessor configurations, that turns out to have been benign
because something else would always come back along when timeout state
changed and fix the broken value before the expiration.
But in SMP, this opens up races. For example, the idle thread on one
CPU can see that there are no active threads and schedule a maximum
value timeout at the same time as the other thread adds a new timeout
that expects a near-term expiration. The broken code here would see
that the new timeout exists, decide that yes it needs to override, but
then set the K_TICKS_FOREVER value it got from the idle thread!
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
When the kernel is TICKLESS, timeouts are set as needed, and drivers
all have some minimum amount of time before which they can reliably
schedule an interrupt. When this happens, drivers will kick the
requested interrupt out by one tick. This means that it's not
reliably possible to get a timeout set for "one tick in the
future"[1].
And attempting to do that is dangerous anyway. If the driver will
delay a one-tick interrupt, then code that repeatedly tries to
schedule an imminent interrupt may end up in a state where it is
constantly pushing the interrupt out into the future, and timer
interrupts stop arriving! The timeout layer actually has protection
against this case.
Finally getting to the point: in recent changes, the timeslice layer
lost its integration with the "imminent" test in the timeout code, so
it's now able to run into this situation: very rapidly context
switching code (or rapidly arriving interrupts) will have the effect
of infinitely[2] delaying timeouts and stalling the whole timeout
subsystem.
Don't try to be fancy. Just clamp timeslice duration such that a
slice is 2 ticks at minimum and we'll never hit the problem. Adjust
the two tests that were explicitly requesting very short slice rates.
[1] Of course, the tradeoff is that the tick rate can be 100x higher
or more, so on balance tickless is a huge win.
[2] Actually it only lasts until a 31 bit signed rollover in the HPET
cycle count in practice.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
Recent work to normalize use of the thread QUEUED state bit means that
we never attempt to remove unqueued threads from the low-level run
queue. So the old workaround for SWAP_NONATOMIC that was trying to
detect this condition isn't necessary anymore.
Which is serendipitous, because it was written to encode some very
specific logic about the circumstances where _current could be
dequeued that I'd like to be able to break.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
This is part of the scheduler API, and was always just a synchronized
wrapper around the internal ready_thread() function. But where the
internal users seem to be careful not to call it on threads that are
not known to be already queued or running, the general users in the
IPC code seem to be less strict.
Add a simple test to detect the case where a thread is already
running. Right now this just loops over the array of CPUs, so is O(N)
in the CPU count even though N is never more than four for us
currently. But this is possible without modifying data structures. A
more scalable way to do this if we ever need to run on very parallel
systems would be to use another state bit for RUNNING, or to keep a
backpointer in the thread struct to the CPU it's running on, etc...
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
Swap was originally written to use the scheduler lock just to select a
new thread, but it would be nice to be able to rely on scheduler
atomicity later in the process (in particular it would be nice if the
assignment to cpu.current could be seen atomically). Rework the code
a bit so that swap takes the lock itself and holds it until just
before the call to arch_switch().
Note that the local interrupt mask has always been required to be held
across the swap, so extending the lock here has no effect on latency
at all on uniprocessor setups, and even on SMP only affects average
latency and not worst case.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
Aborted threads will cancel their timeouts, but the timeout subsystem
isn't protected under the same lock so it's possible for a timeout to
fire just as a thread is being aborted and wake it up unexpectedly.
Check the state before blowing anything up.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
This got missed, leaving garbage there for restarted threads to trip
on. Actually I see multiple uninitialized fields, which seems odd.
This code deserves some rework, thread initialization isn't a
performance path and we should probably be zeroing the struct out.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
Remove duplication in the code by moving macro LOCKED() to the correct
kernel_internal.h header.
Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
This adds a new kconfig CONFIG_SRAM_OFFSET to specify the offset
from beginning of SRAM where the kernel begins. On x86 and
PC compatible platforms, the first 1MB of RAM is reserved and
Zephyr should not link anything there. However, this 1MB still
needs to be mapped by the MMU to access various platform related
information. CONFIG_SRAM_OFFSET serves similar function as
CONFIG_KERNEL_VM_OFFSET and is needed for proper phys/virt
address translations.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
The Z_BOOT_VIRT_TO_PHYS() and Z_BOOT_PHYS_TO_VIRT() address
translation macros are flipped in their calculations.
The calculation is supposed to be:
virt = phys + ((KERNEL_VM_BASE + KERNEL_VM_OFFSET) -
SRAM_BASE_ADDRESS)
So fix the them.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
The computation was using the already-adjusted input value that
assumed relative timeouts and not the actual argument the user passed.
Absolute timeouts were consistently waking up one tick early.
Fixes#32499
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
Following the idiom used for system calls, add script support to read
the initial application binary to identify which devices are defined,
and to use their offset in the device array as their unique handle
rather than the externally-defined ordinal from devicetree. The
device dependency arrays are updated to use these handles.
Signed-off-by: Peter Bigot <peter.bigot@nordicsemi.no>
Move the busy status from a global atomic bit sequence to atomic flags
in the device PM state. While this temporarily adds 4 bytes to each
PM structure the whole device PM infrastructure will be refactored and
it's likely the extra memory can be recovered.
Signed-off-by: Peter Bigot <peter.bigot@nordicsemi.no>
Separate the state indicator of whether the initialization function
has been invoked from the success or failure of the initialization.
This allows precise confirmation that the device is ready (i.e. it has
been initialized, and that initialization succeeded).
Signed-off-by: Peter Bigot <peter.bigot@nordicsemi.no>
This avoids the need for distinct object that uses flash to store its
initializer. Instead the state is initialized when the kernel is
starting up, before anything can reference it. In future refactoring
the PM state could be accessed directly without storing an extra
pointer in the static device state.
Signed-off-by: Peter Bigot <peter.bigot@nordicsemi.no>
Initialize all device objects in a batch before invoking any code that
might try to reference data in them. This eliminates a race condition
enabled by the ability to resolve a device structure at build time,
and reference it from one device's initialization routine before the
device itself has been initialized.
While the device is pulled from the sys_init records rather than
static devices, all in-tree init_entry records that are associated
with devices are produced via Z_DEVICE_DEFINE(), so there should be no
static devices that would be missed by instead iterating over the
device records.
Signed-off-by: Peter Bigot <peter.bigot@nordicsemi.no>
Some recent changes exposed some common "arch_switch() anti-patterns"
in various architectures. The documentation technically described
this all correctly, but probably wasn't as clear as it should have
been. Rewrite, making clear exactly what needs to happen and how the
fields should be interpreted.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
It was possible with pathological timing (see below) for the scheduler
to pick a cycle of threads on each CPU and enter the context switch
path on all of them simultaneously.
Example:
* CPU0 is idle, CPU1 is running thread A
* CPU1 makes high priority thread B runnable
* CPU1 reaches a schedule point (or returns from an interrupt) and
decides to run thread B instead
* CPU0 simultaneously takes its IPI and returns, selecting thread A
Now both CPUs enter wait_for_switch() to spin, waiting for the context
switch code on the other thread to finish and mark the thread
runnable. So we have a deadlock, each CPU is spinning waiting for the
other!
Actually, in practice this seems not to happen on existing hardware
platforms, it's only exercisable in emulation. The reason is that the
hardware IPI time is much faster than the software paths required to
reach a schedule point or interrupt exit, so CPU1 always selects the
newly scheduled thread and no deadlock appears. I tried for a bit to
make this happen with a cycle of three threads, but it's complicated
to get right and I still couldn't get the timing to hit correctly. In
qemu, though, the IPI is implemented as a Unix signal sent to the
thread running the other CPU, which is far slower and opens the window
to see this happen.
The solution is simple enough: don't store the _current thread in the
run queue until we are on the tail end of the context switch path,
after wait_for_switch() and going to reach the end in guaranteed time.
Note that this requires changing a little logic to handle the yield
case: because we can no longer rely on _current's position in the run
queue to suppress it, we need to do the priority comparison directly
based on the existing "swap_ok" flag (which has always meant
"yielded", and maybe should be renamed).
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
The QUEUED state flag was managed separately from the run queue
insertion/deletion, and the logic (while AFAICT perfectly correct) was
tangled in a few places trying to keep them in sync. Put the
management of both behind a queue_thread()/dequeue_thread() API for
clarity. The ALWAYS_INLINE usage seems to be working to get the
compiler to condense the resulting multiple assignments. No behavior
change.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
The "null out the switch handle and put it back" code in the swap
implementation is a holdover from some defensive coding (not wanting
to break the case where we picked our current thread), but it hides a
subtle SMP race: when that field goes NULL, another CPU that may have
selected that thread (which is to say, our current thread) as its next
to run will be spinning on that to detect when the field goes
non-NULL. So it will get the signal to move on when we revert the
value, when clearly we are still running on the stack!
In practice this was found on x86 which poisons the switch context
such that it crashes instantly.
Instead, be firm about state and always set the switch handle of a
currently running thread to NULL immediately before it starts running:
right before entering arch_switch() and symmetrically on the interrupt
exit path.
Fixes#28105
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
Some legacy spots in our IPC layer (legally) pass a NULL wait queue to
pend(). Allow this in the coherence assertion.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
The poll code uses a dummy wait queue so the threads have something to
block on, but the previous coherence pass (which rearranged things to
put the _poller data elsewhere) missed that this was on the stack,
which is not allowed. It actually has no use except as a list, so
make it a global static instead.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
The z_swap_unlocked() function used a dummy spinlock for simplicity.
But this runs afouls of checking for stack-resident spinlocks
(forbidden when KERNEL_COHERENCE is set). And it's executing needless
code to release the lock anyway. Replace with a compile time NULL,
which will improve performance, correctness and code size.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
The two calls to unpend a thread from a wait queue were inexplicably*
unsynchronized, as James Harris discovered. Rework them to call the
lowest level primities so we can wrap the process inside the scheduler
lock.
Fixes#32136
* I took a brief look. What seems to have happened here is that these
were originally synchronized via an implicit from an outer caller
(remember the original Uniprocessor irq_lock() API is a recursive
lock), and they were mostly implemented in terms of middle-level
calls that were themselves locked. So those got ported over to the
newer spinlock but the outer wrapper layer got forgotten.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
This lets the linker tell us what kind of alignment is required
for both tdata and tbss data when copying them into stack.
If they are not aligned as expected by the toolchain, generated
code would be accessing incorrect location for thread variables.
Fixes#32015
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
The linker script defines `z_mapped_size` as follows:
```
z_mapped_size = z_mapped_end - z_mapped_start;
```
This is done with the belief that precomputed values at link time will
make the code smaller and faster.
On Aarch64, symbol values are relocated and loaded relative to the PC
as those are normally meant to be memory addresses.
Now if you have e.g. `CONFIG_SRAM_BASE_ADDRESS=0x2000000000` then
`z_mapped_size` might still have a reasonable value, say 0x59334.
But, when interpreted as an address, that's very very far from the PC
whose value is in the neighborhood of 0x2000000000. That overflows the
4GB relocation range:
```
kernel/libkernel.a(mmu.c.obj): in function `z_mem_manage_init':
kernel/mmu.c:527:(.text.z_mem_manage_init+0x1c):
relocation truncated to fit: R_AARCH64_ADR_PREL_PG_HI21
```
The solution is to define `Z_KERNEL_VIRT_SIZE` in terms of
`z_mapped_end - z_mapped_start` at the source code level. Given this
is used within loops that already start with `z_mapped_start` anyway,
the compiler is smart enough to combine the two occurrences and
dispense with a size counter, making the code effectively
slightly better for all while avoiding the Aarch64 relocation
overflow:
```
text data bss dec hex filename
1216 8 294936 296160 484e0 mmu.c.obj.arm64.before
1212 8 294936 296156 484dc mmu.c.obj.arm64.after
1110 8 9244 10362 287a mmu.c.obj.x86-64.before
1106 8 9244 10358 2876 mmu.c.obj.x86-64.after
```
Signed-off-by: Nicolas Pitre <npitre@baylibre.com>
The SYS_CLOCK_TICKS_PER_SEC default may depend on the kernel config
for tickless, rather than the capability.
Signed-off-by: Martin Åberg <martin.aberg@gaisler.com>
Activating K_FP_REGS flags introduces stack memory
overhead for the main thread in Cortex-M architecture.
Several ARM platforms experience main thread stack
overflows when building with FPU_SHARING=y.
Enabling FPU sharing in main thread should not be
the default configuration. Users are welcome to
enable FP sharing on the main thread in the
application code, in main().
This reverts commit 8453a73ede.
Signed-off-by: Ioannis Glaropoulos <Ioannis.Glaropoulos@nordicsemi.no>
The call to arch_mem_coherent() inside spinlock.h
when spinlock validation and memory coherence enabled
is causing build error as spinlock.h does not include
kernel_arch_func.h directly. However, simply including
that file does not work either as this creates
the chicken-or-egg in the chain of include files.
In order to make spin validation work with kernel
coherence enabled, a separate function is created
to break the circular dependencies of include files.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
There was an edge case in the timeout handling (exposed by, but not
strictly related to, the recent timeslice fix): the next_timeout()
computation would include time slice expiration as a clamp on the
result, but this would be invoked also on the z_set_timeout_expiry()
path which gets hooked on entry to a new thread which is needed to set
the timeout in the first place. So if no other timer interrupt was
scheduled, it was possible to miss the first timeslice interrupt after
thread scheduling.
The explanation is much longer than the fix (use <= as the comparator
instead of <).
In practice this was only being hit in the existing test suite on
riscv miv running under renode using non-default clock rates.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
Fix an edge case that snuck in with the recent fix: if timeslicing is
enabled, the CPU's slice_ticks will be zero, and thus match a timeout
object's dticks value of zero, and thus get suppressed (because "we
already have a timeout scheduled for that") incorrectly.
Fixes#31789
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
There are more and more tests that fail due to stackoverflow.
Increasing MAIN_STACK_SIZE to fix those issues.
Signed-off-by: Alexandre Bourdiol <alexandre.bourdiol@st.com>
Time slices don't have a timeout struct associated and stored in
timeout_list. Time slice timeout is direct programmed in the system
clock and tracked in _current_cpu->slice_ticks.
There is one issue where the time slice timeout can be missed because
the system clock is re-programmed to a longer timeout. To this happens,
it is only necessary that the timeout_list is empty (any timeout set)
and a new timeout longer than remaining time slice is set. This is cause
because z_add_timeout does not check for the slice ticks.
The following example spots the issue:
K_THREAD_STACK_DEFINE(tstack, STACK_SIZE);
K_THREAD_STACK_ARRAY_DEFINE(tstacks, NUM_THREAD, STACK_SIZE);
K_SEM_DEFINE(sema, 0, NUM_THREAD);
static inline void spin_for_ms(int ms)
{
uint32_t t32 = k_uptime_get_32();
while (k_uptime_get_32() - t32 < ms) {
}
}
static void thread_time_slice(void *p1, void *p2, void *p3)
{
printk("thread[%d] - Before spin\n", (int)(uintptr_t)p1);
/* Spinning for longer than slice */
spin_for_ms(SLICE_SIZE + 20);
/* The following print should not happen before another
* same priority thread starts.
*/
printk("thread[%d] - After spinning\n", (int)(uintptr_t)p1);
k_sem_give(&sema);
}
void main(void)
{
k_tid_t tid[NUM_THREAD];
struct k_thread t[NUM_THREAD];
uint32_t slice_ticks = k_ms_to_ticks_ceil32(SLICE_SIZE);
int old_prio = k_thread_priority_get(k_current_get());
/* disable timeslice */
k_sched_time_slice_set(0, K_PRIO_PREEMPT(0));
for (int j = 0; j < 2; j++) {
k_sem_reset(&sema);
/* update priority for current thread */
k_thread_priority_set(k_current_get(), K_PRIO_PREEMPT(j));
/* synchronize to tick boundary */
k_usleep(1);
/* create delayed threads with equal preemptive priority */
for (int i = 0; i < NUM_THREAD; i++) {
tid[i] = k_thread_create(&t[i], tstacks[i], STACK_SIZE,
thread_time_slice, (void *)i, NULL,
NULL, K_PRIO_PREEMPT(j), 0,
K_NO_WAIT);
}
/* enable time slice (and reset the counter!) */
k_sched_time_slice_set(SLICE_SIZE, K_PRIO_PREEMPT(0));
/* Spins for while to spend this thread time but not longer */
/* than a slice. This is important */
spin_for_ms(100);
printk("before sleep\n");
/* relinquish CPU and wait for each thread to complete */
k_sleep(K_TICKS(slice_ticks * (NUM_THREAD + 1)));
for (int i = 0; i < NUM_THREAD; i++) {
k_sem_take(&sema, K_FOREVER);
}
/* test case teardown */
for (int i = 0; i < NUM_THREAD; i++) {
k_thread_abort(tid[i]);
}
/* disable time slice */
k_sched_time_slice_set(0, K_PRIO_PREEMPT(0));
}
k_thread_priority_set(k_current_get(), old_prio);
}
Signed-off-by: Flavio Ceolin <flavio.ceolin@intel.com>
Some arches like x86 need all memory mapped so that they can
fetch information placed arbitrarily by firmware, like ACPI
tables.
Ensure that if this is the case, the kernel won't accidentally
clobber it by thinking the relevant virtual memory is unused.
Otherwise this has no effect on page frame management.
Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
If we evict enough pages to completely fill the backing store,
through APIs like k_mem_map(), z_page_frame_evict(), or
z_mem_page_out(), this will produce a crash the next time we
try to handle a page fault.
The backing store now always reserves a free storage location
for actual page faults.
Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
This will enable testing of the implementation until the
critical set of pages is identified and known to the
kernel.
Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
Implement runtime APIs for pinning, paging in, and evicting
memory, as well as the page fault hook called from architecture
code.
Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
Architecture layer hooks for demand paging. See
doxygen for these API definitions for more details.
Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
Page tables created at build time may not include the
gperf data at the very end of RAM. Ensure this is mapped
properly at runtime to work around this.
Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
Pre-allocation of paging structures is now required, such that
no allocations are ever needed when mapping memory.
Instantiation of new memory domains may still require allocations
unless a common page table is used.
Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
Allows applications to increase the data space available to Zephyr
via anonymous memory mappings. Loosely based on mmap().
Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
The strategy used in z_heap_aligned_alloc() was to allocate an extra
align-sized memory block for storing a pointer to the memory heap.
This is wasteful in terms of memory usage when alignment is larger
than a pointer width. A loop is needed to find the initial memory
start when freeing it which isn't optimal either.
Instead, let's have sys_heap_aligned_alloc() rewind a pointer after
it is aligned to make just enough room for storing our heap reference.
This way the heap reference is always located immediately before the
aligned memory and any unused memory is returned to the heap.
The rewind and alignment values may coincide in which case only
the alignment is necessary anyway.
Signed-off-by: Nicolas Pitre <npitre@baylibre.com>
Remove conditionals (PM_DEEP_SLEEP_STATES and PM_SLEEP_STATES) from
power management code. Now these features are always available when
power management is enabled.
Signed-off-by: Flavio Ceolin <flavio.ceolin@intel.com>
Migrate the whole pm subsystem to use new power states information
from power_state.h and get states and residency properties from
device tree.
Signed-off-by: Flavio Ceolin <flavio.ceolin@intel.com>
The internal API to measure time until a delay expires does not modify
the referenced timeout. Make the functions that call it take pointers
to const objects, so that they can be used with pointer to
const-qualified containers.
Signed-off-by: Peter Bigot <peter.bigot@nordicsemi.no>
This removes the z_ prefix those (functions, enums, etc.) that
are being used outside the coredump subsys. This aligns better
with the naming convention.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
If we evict enough pages to completely fill the backing store,
through APIs like k_mem_map(), z_page_frame_evict(), or
z_mem_page_out(), this will produce a crash the next time we
try to handle a page fault.
The backing store now always reserves a free storage location
for actual page faults.
Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
This will enable testing of the implementation until the
critical set of pages is identified and known to the
kernel.
Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
Implement runtime APIs for pinning, paging in, and evicting
memory, as well as the page fault hook called from architecture
code.
Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
Architecture layer hooks for demand paging. See
doxygen for these API definitions for more details.
Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
Page tables created at build time may not include the
gperf data at the very end of RAM. Ensure this is mapped
properly at runtime to work around this.
Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
Pre-allocation of paging structures is now required, such that
no allocations are ever needed when mapping memory.
Instantiation of new memory domains may still require allocations
unless a common page table is used.
Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
Allows applications to increase the data space available to Zephyr
via anonymous memory mappings. Loosely based on mmap().
Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
Following the idiom used for system calls, add script support to read
the initial application binary to identify which devices are defined,
and to use their offset in the device array as their unique handle
rather than the externally-defined ordinal from devicetree. The
device dependency arrays are updated to use these handles.
Signed-off-by: Peter Bigot <peter.bigot@nordicsemi.no>
The only two supported operations for data caches in the cache framework
are currently arch_dcache_flush() and arch_dcache_invd().
This is quite restrictive because for some architectures we also want to
control i-cache and in general we want a finer control over what can be
flushed, invalidated or cleaned. To address these needs this patch
expands the set of operations that can be performed on data and
instruction caches, adding hooks for the operations on the whole cache,
a specific level or a specific address range.
Signed-off-by: Carlo Caione <ccaione@baylibre.com>
register_event always returns 0, so the conditional will
always take the first branch and code in the else part
is never reached.
Fixes#31282
Signed-off-by: Ningx Zhao <ningx.zhao@intel.com>
1. Exclude the CODE UNREACHABLE line while generating coverage report.
2. Exclude the memory domain deprecated API when calculating code
coverage.
Signed-off-by: Enjia Mai <enjiax.mai@intel.com>
First, the maximum heap size must fit in 31 bits worth of chunks
because the internal 32-bit field holding the size is shared with
the `used` bit.
Then the mention of a 256-byte block in the doc is no longer
relevant. That pertained to the previous allocator implementation.
And ditto for the HEAP_MEM_POOL_MIN_SIZE kconfig option.
Signed-off-by: Nicolas Pitre <npitre@baylibre.com>
Needing to check the current cycle time (which involves a spinlock and
register read on most architectures) is wasteful in the scheduler
priority predicate, which is a hot path. If we "burn" one bit of
precision (and document the rule), we can do the comparison without
knowing the current time.
2^31 cycles is still far longer than a live deadline thread in any
legitimate realtime app should ever live before being scheduled.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
Adds a linker section for Cortex-M instruction tightly coupled memory
(ITCM), similar to the existing section for DTCM. A new executable MPU
region is not added as there isn't currently a need to make this section
accessible to user mode. This section can be enabled by setting a device
tree chosen node zephyr,itcm.
Signed-off-by: Maureen Helm <maureen.helm@nxp.com>
This allows allocating dynamic kernel objects with memory alignment
requirements. The first candidate is for thread objects where,
on some architectures, it must be aligned for saving/restoring
registers.
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
PM depends on SYS_CLOCK_EXISTS in Kconfig but several boards have
Kconfig overrides that allow the dependency to be ignored, so
CONFIG_PM=y even though CONFIG_SYS_CLOCK_EXISTS=n. Fix the code so
that the true dependency is reflected in the generated code.
Signed-off-by: Peter Bigot <peter.bigot@nordicsemi.no>
This change adds z_heap_aligned_alloc() and k_aligned_alloc()
and changes z_heap_malloc() and k_malloc() to be small wrappers around
the aligned variants.
Fixes#29519
Signed-off-by: Christopher Friedt <chrisfriedt@gmail.com>
Ticks should be assigned directly to timeout value in case of
CONFIG_LEGACY_TIMEOUT_API=y, just as they were before referenced patch.
Fixes: 7a815d5d99 ("kernel: sched: Use k_ticks_t in z_tick_sleep")
Signed-off-by: Marcin Niestroj <m.niestroj@grinn-global.com>
Renamed to make its semantics clearer; this function maps
*physical* memory addresses and is not equivalent to
posix mmap(), which might confuse people.
mem_map test case remains the same name as other memory
mapping scenarios will be added in the fullness of time.
Parameter names to z_phys_map adjusted slightly to be more
consistent with names used in other memory mapping functions.
Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
Inside the idle loop, in some configuration, IRQ is unlocked and
then immediately locked again. There is a side effect:
1. IRQ is unlocked in middle of the loop.
2. Another thread (A) can now run so idle thread is un-scheduled.
3. Thread A runs to its end and going through the thread
self-abort path.
4. Idle thread is rescheduled again, and continues to run
the remaining loop when it eventuall calls k_cpu_idle().
The "pending abort" path is not being executed on thread A
at this point.
5. Now, thread A is suspended, and the CPU is in idle waiting
for interrupts (e.g. timeouts).
6. Thread B is waiting to join on thread A. Since thread A has
not been terminated yet so thread B is waiting until
the idle thread runs again and starts executing from
the beginning of while loop.
7. Depending on how many threads are running and how active
the platform is, idle thread may not run again for a while,
resulting in thread B appearing to be stuck.
To avoid this situation, the unlock/lock pair in middle of
the loop is removed so no rescheduling can be done mid-loop.
When there is no thread abort pending, it simply locks IRQ
and calls k_cpu_idle(). This is almost identical to the idle
loop before the thread abort code was introduced (except
the check for cpu->pending_abort).
Fixes#30573
Signed-off-by: Daniel Leung <daniel.leung@intel.com>
In order to release irq_offload semaphore outside kernel/thread.c, we
make it visible by modifying it non-static under ztest. This would be
needed such as when call irq_offload() to enter interrupt context and
a fatal error happened, then you have to release it in your fatal
handler, or the irq_offload will still be locked and no longer be
using again.
Signed-off-by: Enjia Mai <enjiax.mai@intel.com>
Cleanup code for power management and remove some duplication and
isolate power management code from the kernel code.
Signed-off-by: Anas Nashif <anas.nashif@intel.com>
- Remove SYS_ prefix
- shorten POWER_MANAGEMENT to just PM
- DEVICE_POWER_MANAGEMENT -> PM_DEVICE
and use PM_ as the prefix for all PM related Kconfigs
Signed-off-by: Anas Nashif <anas.nashif@intel.com>
k_heap did not have an aligned alloc function, even though
this is supported by the internal sys_heap.
Signed-off-by: Maximilian Bachmann <m.bachmann@acontis.com>
These implemented a k_mem_pool in terms of the now universal k_heap
utility. That's no longer necessary now that the k_mem_pool API has
been removed.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
The mailbox and msgq utilities had API variants that could pass old
mem_pool blocks through the data structure. That API is being
deprected (and the features were obscure), so remove the internal
support.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
The k_mem_pool allocator is no more, and the z_mem_pool compatibility
API is going away. The internal allocator should be a k_heap always.
Signed-off-by: Andy Ross <andrew.j.ross@intel.com>