-
Notifications
You must be signed in to change notification settings - Fork 5.2k
SPI Bus performance loss of ~9% under Python moving from Kernel 6.6.31 -> 6.12.25 #6854
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Oops, apologies for the messy formatting of that python code ! |
It might be worth trying the experimental PREEMPT_RT variant of the RPi downstream 6.12 kernel to see if that reduces jitter. To install the kernel run this
|
Hi Tim Many thanks for the suggestion, I gave the PREEMPT_RT kernel variant a go as suggested above, it is still a slight "downgrade" in terms of average sample rate (and has a marginally better inter-sample distribution "width wise" but that also looks a little "lumpy", histograms to follow. Running our ADC sampling code, bound to a reserved CPU 3 as "sudo chrt 99 python3 our_code.py" to set it to a high RT priority gives :- 6.6.31 -> 36.7 kHz average sample rate over 1E6 samples 6.12.28 -> 34.76 average sample rate over 1E6 samples So still a small 5% speed hit 6.6.31 versus 6.12.28 RT (but definitly better than the non RT 6.12.28 tests), and a slight win on clamping the maximum inter-sample interval, with that falling from 84 uSec to 47 uSec. Any other suggestions greatfully received, otherwise sticking with 6.6.31 is probably our best bet for now. Cheers |
Histogram for a "standard" 6.12.28 kernel in non-RT mode below for comparison. |
Since SPI is on RP1 the system is exposed to latency/jitter from PCIe. I'm not aware of any major differences in PCIe setup between 6.6 and 6.12 but others might comment if they know otherwise. |
The non-RT latency tail is far too big for PCIe DMA/link effects to be significant contributors. With short transfer sizes such as these, the hardware is used in interrupt-driven FIFO mode, so one potential source of regression is a kernel task that disables interrupts for longer than it did on 6.6 In RT kernels, almost all mechanisms that disable hardware interrupts are abstracted into a complicated pile of thread-like primitives, so a greedy neighbour cannot prevent higher priority tasks from running. |
We think this may be caused by the addition of resource locking in the use of GPIO descriptors to prevent them being freed while in use, a change that appeared in the 6.9 kernel. This would affect performance for SPI because of the use of software controller/GPIO chip select lines. This is a feature we enabled for older Pis because it is more flexible than the rather restricted CS support implemented by the BCM2835 SPI hardware. It also allows the use of arbitrary GPIOs, rather than being restricted to those where the SPI CS alternate functions are mapped. The SPI controllers in RP1 do support HW CS so it might be possible to use it instead. It would require Device Tree changes to indicate to the SPI subsystem that we wish to do so, and to enable the correct alternate functions on the CS pins. However, it's not a feature that has been used much, and some of the comments in the driver make me think it might be a bit quirky. |
This is the upstream change that may increase overhead when using gpio: https://lwn.net/Articles/960024 |
Many thanks to popcornmix, pelwell, P33M and timg236 for all your insights much appreciated. It sounds like the (minor) slow down we see is the result of some pretty deep under the hood tweaks to avoid gpio users colliding / breaking things, good to know there was a specific cause rather than "bad code". For our application sticking with 6.6.31 for now is I think the way to go as we want to maximise speed and only have one application alternating between SPI and GPIO. Best wishes to all. |
The hardware chipselects are badly implemented. If the TX FIFO empties mid-transfer, the hardware deasserts CS. A couple of other silicon integrators have workarounds in hardware which basically reimplement GPIO-CS functionality inside the SPI hardware block. https://github.com/raspberrypi/linux/blob/rpi-6.12.y/drivers/spi/spi-dw-mmio.c#L151 |
Indeed - the comment before that function was one of those I referred to. I can't help thinking that if your resource lifetime management requires interlocking on every single access, however brief, then you're doing it wrong. Surely one of the great things about descriptors is that they are explicitly claimed and released, and if there is a valid descriptor then the underlying gpio chip object (and its driver) should not be deleted. |
The 6.10 kernel introduced a large patchset designed to make gpiochip usage safe in the presence of potential hotplugging events. The changes included protecting every gpiochip access with a claim of an interlock. Running on a Pi 5 these changes reduce GPIO performance from userspace by around 10%. The penalty would be proportionally higher from kernel, as seen by SPI speed reductions. Patch the gpiolib implementation to remove the protection of gpiochip accesses. By providing alternative implementations of the relevant macros, the changes are localised and therefore easier to verify. See: raspberrypi#6854 https://lwn.net/Articles/960024/ Signed-off-by: Phil Elwell <[email protected]>
The 6.9 kernel introduced a large patchset [1] designed to make gpiochip usage safe in the presence of potential hotplugging events. The changes included protecting every gpiochip access with a claim of an interlock. Running on a Pi 5 these changes reduce GPIO performance from userspace by around 10%. The penalty would be proportionally higher from kernel, as seen by SPI speed reductions. Patch the gpiolib implementation to remove the protection of gpiochip accesses. By providing alternative implementations of the relevant macros, the changes are localised and therefore easier to verify. See: raspberrypi#6854 [1] https://lwn.net/Articles/960024/ Signed-off-by: Phil Elwell <[email protected]>
#6862 is a trial patch to reduce (remove?) the overhead of the GPIO chip safety measures. In about 40 minutes you'll be able to install a trial build of it by running |
Many thanks, that's very helpful, happy to give some beta software a go, warnings on data backup noted :). |
The 6.9 kernel introduced a large patchset [1] designed to make gpiochip usage safe in the presence of potential hotplugging events. The changes included protecting every gpiochip access with a claim of an interlock. Running on a Pi 5 these changes reduce GPIO performance from userspace by around 10%. The penalty would be proportionally higher from kernel, as seen by SPI speed reductions. Patch the gpiolib implementation to remove the protection of gpiochip accesses. By providing alternative implementations of the relevant macros, the changes are localised and therefore easier to verify. See: #6854 [1] https://lwn.net/Articles/960024/ Signed-off-by: Phil Elwell <[email protected]>
Uh oh!
There was an error while loading. Please reload this page.
Describe the bug
I am using python (3.11.2) running on a Pi5 to drive an AD7606 ADC with a loop that (1) triggers a hardware sample event using a pulse from a GPIO line (libgpio 1.6.3-1) (2) waits 3-uSec (the hardware acquisition time) followed by (3) read back of 12 bit ADC data via the SPI bus formatted as a pair of int16 values using the spidev (3.6-1) xfer2 function followed by (4) recording a time-stamp using time.perf_counter_ns.
Updating from RPi OS kernel version 6.6.31 -> 6.12.25 the average data rate drops from 35.5 -> 32.5 kHz, a ~9% performance hit. The culprit seems to be the SPI read using spidev's xfer2 function, that step increases from an average of 21.6 -> 24.3 uSec, along with a significant in increase in timing jitter.
Reverting back to 6.6.31 recovers the performance.
Stripping the python code back to a minimalist loop that only runs the spidev xfer2 call (with no hardware in place) shows the same performance drop. Plotting a histogram of the time taken for each sample loop also highlights much more loop-to-loop timing jitter with the 6.12.25 kernel.
Demonstration Python code below.
Steps to reproduce the behaviour
Steps to reproduce the behaviour*
Run the python code below to benchmark the performance of the SPI bus using spidev xfer2 on an RPi5 with a 6.6.31 OS Kernel. No additional hardware required.
Upgrade the Kernel to 6.12.25 (no other packages updates).
Re-run the python code to benchmark the performance of spidev xfer2 and compare before / after average loop time and loop-to-loop timing jitter.
########### Test code ############
Device (s)
Raspberry Pi CM5
System
"Problem" state post update.
cat /etc/rpi-issue
Raspberry Pi reference 2024-03-15
Generated using pi-gen, https://github.com/RPi-Distro/pi-gen, f19ee211ddafcae300827f953d143de92a5c6624, stage4
vcgencmd version
2025/05/08 15:13:17
Copyright (c) 2012 Broadcom
version 69471177 (release) (embedded)
uname -a
Linux raspberrypi 6.12.25+rpt-rpi-2712 #1 SMP PREEMPT Debian 1:6.12.25-1+rpt1 (2025-04-30) aarch64 GNU/Linux
Initial "problem free" state before update.
cat /etc/rpi-issue
Raspberry Pi reference 2024-03-15
Generated using pi-gen, https://github.com/RPi-Distro/pi-gen, f19ee211ddafcae300827f953d143de92a5c6624, stage4
vcgencmd version
2025/05/08 15:13:17
Copyright (c) 2012 Broadcom
version 69471177 (release) (embedded)
uname -a
Linux raspberrypi 6.6.31+rpt-rpi-2712 #1 SMP PREEMPT Debian 1:6.6.31-1+rpt1 (2024-05-29) aarch64 GNU/Linux
Logs
NA
Additional context
We are aiming for "fast" low jitter performance of our ADC system so the drop in performance with a new kernel was a bit of a surprise. All versions of the 6.12.XX kernel tested seem to ahve the same issue. Running python code bound to a single reserved core (import psutil ; p = psutil.Process() ; p.cpu_affinity([3] ) helps slightly with overall speed, but reduces loop-to-loop jitter rather nicely. However this and other minor tweaks, the GPIO hardware trigger, hardware settling time delay etc were unaffected by the kernel update and were all stripped out for the before / after testing code.
The text was updated successfully, but these errors were encountered: