When CONFIG_ARCH_HAS_CUSTOM_BUSY_WAIT is not defined, cycles_to_wait
is calculated using a division operation. This calculation could take a
significant amount of time (a few microseconds on some architectures,
depending on the system clock).
In the special case of zero usec_to_wait, the function should return
immediately rather than spend time on calculations.
For example, in spi driver (spi_context.h, _spi_context_cs_control()),
k_busy_wait() can be called with zero delay. This can increase spi
transaction time significantly.
Another improvement, is moving the start_cycles initialization
before cycles_to_wait calculation, so the time it takes to calculate
cycles_to_wait will be taken into account.
Signed-off-by: David Komel <a8961713@gmail.com>