zephyr

mirror of https://github.com/zephyrproject-rtos/zephyr synced 2025-09-13 03:11:56 +00:00

History

Andy Ross aa3c9b7f1e tests/benchmarks: Add scheduler microbenchmark Useful tool for performance work that removes interaction with other APIs and thread state. The best part is that it doesn't rely on timer interrupt delivery and so works with -icount even on existing qemu versions and produces deterministic output. Signed-off-by: Andy Ross <andrew.j.ross@intel.com>		2019-02-01 15:57:21 -05:00
..
src	tests/benchmarks: Add scheduler microbenchmark	2019-02-01 15:57:21 -05:00
CMakeLists.txt	tests/benchmarks: Add scheduler microbenchmark	2019-02-01 15:57:21 -05:00
prj.conf	tests/benchmarks: Add scheduler microbenchmark	2019-02-01 15:57:21 -05:00
README.rst	tests/benchmarks: Add scheduler microbenchmark	2019-02-01 15:57:21 -05:00
testcase.yaml	tests/benchmarks: Add scheduler microbenchmark	2019-02-01 15:57:21 -05:00

README.rst

Scheduler Microbenchmark
########################

This is a scheduler microbenchmark, designed to measure minimum
latencies (not scaling performance) of specific low level scheduling
primitives independent of overhead from application or API
abstractions.  It works very simply: a main thread creates a "partner"
thread at a higher priority, the partner then sleeps using
_pend_current_thread().  From this initial state:

1. The main thread calls _unpend_first_thread()
2. The main thread calls _ready_thread()
3. The main thread calls k_yield()
   (the kernel switches to the partner thread)
4. The partner thread then runs and calls _pend_current_thread() again
   (the kernel switches to the main thread)
5. The main thread returns from k_yield()

It then iterates this many times, reporting timestamp latencies
between each numbered step and for the whole cycle, and a running
average for all cycles run.

Note that because this involves no timer interaction (except, on some
architectures, k_cycle_get_32()), it works correctly when run in QEMU
using the -icount argument, which can produce 100% deterministic
behavior (not cycle-exact hardware simulation, but exactly N
instructions per simulated nanosecond).  You can enable this using an
environment variable (set at cmake time -- it's not enough to do this
for the subsequent make/ninja invocation, cmake needs to see the
variable itself):

    export QEMU_EXTRA_FLAGS="-icount shift=0,align=off,sleep=off"