Enabling `CONFIG_TIMESLICING` and `CONFIG_TICKLESS_KERNEL` at the same time makes thread switching much slower #88353

fkokosinski · 2025-04-09T09:55:24Z

Describe the bug
When CONFIG_TIMESLICING (with CONFIG_TIMESLICE_SIZE set to non-zero value) and CONFIG_TICKLESS_KERNEL are enabled at the same time, the time required to switch threads is noticeably higher. In the outlier case on qemu_x86_64 with kvm enabled and single cpu the execution of the code was >100 times slower on the test platform after enabling tickless mode (without kvm and with default two cpus enabling tickless mode made it ~2 times slower). In the case of the mimxrt685_evk platform it was ~10 times slower, so the exact slowdown depends on the platform.

The timeslice size doesn't seem to affect the results in a noticeable way.

To Reproduce
Run the following code with and without CONFIG_TICKLESS_KERNEL using the following config. This sample creates THRD_C threads, THRD_C semaphores and passes control to the next thread REPEATS times in a loop. At the end it prints the runtime.

CONFIG_TIMESLICING=y
# the timeslice size doesn't affect the runtime noticeably, it just has to be bigger than 0
CONFIG_TIMESLICE_SIZE=10
# enabling tickless mode makes it slower
CONFIG_TICKLESS_KERNEL=y

# in case of qemu_x86_64 with kvm, enabling just one core and disabling SMP makes the difference between tickless and non-tickless bigger
# CONFIG_SMP=n
# CONFIG_MP_MAX_NUM_CPUS=1

CONFIG_PICOLIBC_IO_FLOAT=y

#include <zephyr/kernel.h>

#define THRD_C 2
#define REPEATS 100000

int64_t start_time;
int c = 0;

struct k_thread thrds[THRD_C];
K_THREAD_STACK_ARRAY_DEFINE(thrd_stacks, THRD_C, 4096);
struct k_sem smphs[THRD_C];

int ids[THRD_C];

void thrd_func(void *arg, void *, void *)
{
	int id = *(int*)arg;

	while (1) {
		k_sem_take(&smphs[id], K_FOREVER);
		c++;
		if (c >= REPEATS){
			printf("time %f\n", (k_uptime_get() - start_time) / 1e3);
			return;
		}
		k_sem_give(&smphs[(id + 1) % THRD_C]);
	}
}

int main(void)
{
	for (int i = 0; i < THRD_C; i++){
		ids[i] = i;
		k_sem_init(&smphs[i], 0, 1);

		(void)k_thread_create(&thrds[i], thrd_stacks[i],
				K_THREAD_STACK_SIZEOF(thrd_stacks[i]), thrd_func, &ids[i], NULL,
				NULL, 0, 0, K_NO_WAIT);
	}

	start_time = k_uptime_get();
	k_sem_give(&smphs[0]);

	for (int i = 0; i < THRD_C; i++){
		k_thread_join(&thrds[i], K_FOREVER);
	}

	return 0;
}

The snippet was built and run using the standard west build -b board_name and west build -t run/west flash commands depending on platform type, apart from qemu_x86_64 with a single CPU, which required launching it manually due to conflicts between -enable-kvm and -icount parameters:
qemu-system-x86_64 -nographic -m 32 -enable-kvm -device loader,file=build/zephyr/zephyr-qemu-main.elf -kernel build/zephyr/zephyr-qemu-locore.elf.

Measured runtimes

Platform	Time w/ tickless	Time w/o tickless
qemu_x86_64 2cpu	1.15s	0.45s
qemu_x86_64 2cpu kvm	0.27s	0.14s
qemu_x86_64 1cpu kvm	2.12s	0.02s
qemu_x86_64 1cpu no smp kvm	4.13s	0.01s
mimxrt685_evk/mimxrt685s/cm33	3.02s	0.31s

Expected behavior
Enabling tickless mode shouldn't impact execution time noticeably.

Impact
In one app we've noticed up to multiple second delays after starting worker threads if tickless and timeslicing were enabled at the same time.

Environment:

OS: Linux
Toolchain: Zephyr SDK 0.16.8
Commit SHA: c53fb67

The text was updated successfully, but these errors were encountered:

peter-mitsis · 2025-04-11T22:53:57Z

This test seems to be designed to elicit the worst case scenario of time-slicing. Each time a thread is switched out, the timeout associated with the current/old timeslice must be cancelled and new one created. The test then hammers the context switching via k_sem_give() and a blocking k_sem_take(). As a result, we wind up doing a lot of extra operations on the timeout queue.

With the above in mind, it would not surprise me if we got similar measurements if time-slicing was disabled, but we provided a finite timeout to k_sem_take().

All this being said, I still plan to take a closer look to better gauge what might be done about this. (Incidentally, this is arguably more of an enhancement as opposed to a bug.)

peter-mitsis · 2025-04-17T21:46:08Z

Update:

Using the disco_l475_iot1 board as a reference, we can get about a +4% performance boost simply by inlining the routine remove_timeout().
As mentioned previously, the context switching is hammering the timeout aborting and setting. I am looking into the possibility of bypassing some of that. As our documentation states that time slicing only guarantees the maximum amount of time before the thread yields to another of equal priority (if it exists). One interpretation of this is that we may not need to always abort and reset the time slice timeout each time we switch in a new thread. That is, if the new thread can be sliced, then we may be able to bypass this costly step and piggyback on an existing time slice timeout.

fkokosinski · 2025-04-18T08:22:04Z

Hey @peter-mitsis, thanks for taking the time to look into this!

With the above in mind, it would not surprise me if we got similar measurements if time-slicing was disabled, but we provided a finite timeout to k_sem_take().

Would this explain the difference we observed with CONFIG_TICKLESS_KERNEL enabled/disabled as well?

peter-mitsis · 2025-04-18T17:30:41Z

@fkokosinski - Thanks for drawing my attention back to the tickless aspect as I was getting rather focused on the on the time slicing. Yes, I think that this would explain the differences observed with CONFIG_TICKLESS_KERNEL enabled/disabled as well.

The only timeout expected to be present in the system in the provided code sample is the timeout associated with the time slice.
Consequently, when its timeout is added via z_add_timeout(), we are going to use the tickless kernel version of sys_clock_set_timeout() on each of its calls. This in turn is going to be accessing the system timer registers , which is often slow (as recently pointed out in recent PRs such as #87948).

(It is worth noting that the non-tickless version sys_clock_set_timeout() is essentially a no-op).

fkokosinski added area: Kernel bug The issue is a bug, or the PR is fixing a bug labels Apr 9, 2025

github-actions bot assigned andyross and peter-mitsis Apr 9, 2025

nashif added the priority: medium Medium impact/importance bug label Apr 14, 2025

kartben linked a pull request May 3, 2025 that will close this issue

kernel: Timeslice enhancements #89426

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enabling `CONFIG_TIMESLICING` and `CONFIG_TICKLESS_KERNEL` at the same time makes thread switching much slower #88353

Enabling `CONFIG_TIMESLICING` and `CONFIG_TICKLESS_KERNEL` at the same time makes thread switching much slower #88353

fkokosinski commented Apr 9, 2025

peter-mitsis commented Apr 11, 2025

peter-mitsis commented Apr 17, 2025

fkokosinski commented Apr 18, 2025

peter-mitsis commented Apr 18, 2025

Enabling CONFIG_TIMESLICING and CONFIG_TICKLESS_KERNEL at the same time makes thread switching much slower #88353

Enabling CONFIG_TIMESLICING and CONFIG_TICKLESS_KERNEL at the same time makes thread switching much slower #88353

Comments

fkokosinski commented Apr 9, 2025

peter-mitsis commented Apr 11, 2025

peter-mitsis commented Apr 17, 2025

fkokosinski commented Apr 18, 2025

peter-mitsis commented Apr 18, 2025

Enabling `CONFIG_TIMESLICING` and `CONFIG_TICKLESS_KERNEL` at the same time makes thread switching much slower #88353

Enabling `CONFIG_TIMESLICING` and `CONFIG_TICKLESS_KERNEL` at the same time makes thread switching much slower #88353