Implement Cortex-R floating-point support #19979

stephanosio · 2019-10-21T11:11:32Z

At the time of writing, ARM Cortex-R port does not support the use of hardware floating-point unit (VFP and NEON).

Considering common application scenarios for Cortex-R (real-time processing), it is imperative that hardware floating-point unit support is available for it; otherwise, practical usability of the Cortex-R port becomes questionable.

Overview

An overview of the hardware floating-point unit for Cortex-R is as follows:

Cortex-R4 and Cortex-R5
- VFPv3-D16 may be optionally instantiated.
- 16 double-precision registers are available (instead of 32 in full VFPv3).
- Single-precision and double-precision operations are supported.
- Half-precision operations are not supported.
- Vector operations are not supported (software emulation is possible).
- R5 may also optionally instantiate single-precision only VFPv3-D16.
Cortex-R7 and Cortex-R8
- VFPv3-D16-FP16 may be optionally instantiated.
- Two variants of VFPv3-D16-FP16 are available: Optimised and Full.
- Optimised variant implements only single-precision and half-precision operations.
- Full variant implements single-precision, half-precision and double-precision operations.
- Vector operations are not supported (software emulation is possible).
Cortex-R52
- VFPv3-FP16 or VFPv3-D16-FP16 must be instantiated.
- Two variants are available: SP-only and Full Advanced SIMD.
- SP-only variant
  - VFPv3-D16 (i.e. with 16 double-precision registers) is instantiated.
  - Half-precision and single-precision operations are supported.
- Full Advanced SIMD variant
  - VFPv3 (i.e. with 32 double-precision registers) is instantiated.
  - NEON is instantiated.
  - Half-precision, single-precision, double-precision and vector operations are supported (with some optional subset specified by MVFR).

Specifications

ARM Cortex-R floating-point support feature shall:

support Unshared FP registers mode and Shared FP registers mode.
1. In both modes, the kernel shall initialise the FPU. In addition, all FP registers shall be initialised if DCLS (dual-redundant core lock-step) is configured.
2. In Unshared FP registers mode, the kernel shall configure the FPU and leave it in operational state after booting.
3. In Shared FP registers mode, all threads shall assume "FP disabled" state by default.
optionally support emulation of the VFP instructions that are unimplemented by hardware.

ARM Cortex-R floating-point support feature, for Shared FP registers mode, shall:

manage FP enable status at thread level, in conformance with the kernel FP interface.
1. K_FP_REGS option shall be used to specify whether thread-wide floating-point support is enabled.
2. K_FP_REGS option may be (re-)enabled only for the threads that were initially created with the same option.
disable FPU after a context switch and re-enable it upon exception.
1. After a context switching occurs, FPU shall be disabled by setting FPEXC.EN=0.
2. FPU shall be re-enabled after the first FP instruction that caused undefined instruction exception if and only if K_FP_REGS option is set for the thread.
3. The first FP instruction that caused an undefined instruction exception shall be re-executed after setting FPEXC.EN=1.
store s0-s15 and FPSCR in exception stack frame and s16-s31 in thread context.
1. s0-s15 and FPSCR shall be stored in the exception stack frame in order to allow the optional use of FP registers inside a nested interrupt handler.
2. s16-s31 shall be preserved only for a thread context switch (and not for an exception entry).
implement lazy stacking of FP context.
1. All context switching-capable exception handler routines shall save only the basic stack frame upon entry.
2. s0-s15 shall be stored to the exception stack frame during a context switch if FPU is enabled; in which case, the thread being switched out shall be marked to indicate that FP context is saved.
3. s0-s15 shall be restored if the thread being switched in is marked to contain an FP context.
preserve s16-s31 during context switch only when FPU is enabled.
1. If FPU is enabled at the time of context switching, at least one FP instruction had been executed after the previous context switch and the FP context must preserved.
2. FP context preservation of s16-s31 shall be implied by that of s0-s15.

Note

This feature will also be applicable to Cortex-A port when it is added in the future, as Cortex-R architecture is very similar to Cortex-A.
Specification 3-ii should really apply to Cortex-M as well.

The text was updated successfully, but these errors were encountered:

bbolen · 2020-07-22T16:31:19Z

@stephanosio I found your https://github.com/stephanosio/zephyr/commits/aarch32_non_m_fp_alt branch and pulled those changes. I wanted to see if you have done any more work on this issue, specifically have you implemented anything mentioned in the "Specifications 3-7" section above?

stephanosio · 2020-07-23T05:48:22Z

@stephanosio I found your https://github.com/stephanosio/zephyr/commits/aarch32_non_m_fp_alt branch and pulled those changes. I wanted to see if you have done any more work on this issue, specifically have you implemented anything mentioned in the "Specifications 3-7" section above?

@bbolen There is a Cortex-A (and -R) FP sharing implementation that sort of works here:
https://github.com/ibirnbaum/zephyr/blob/armv7_cortex_a/arch/arm/core/aarch32/swap_helper.S

bbolen · 2020-07-27T15:29:55Z

@stephanosio thank you

bbolen · 2020-10-12T20:05:02Z

Can you elaborate on the description above with respect to items 5 and 6? I'm struggling to understand why the vfp registers would be saved on the exception stack and not the thread context in the normal case. I can see needing to temporarily put it on the exception stack in order for the exception handler to use the VFP unit, but I would assume they would be popped off that stack and pushed onto the thread context during the context switch.

bbolen · 2021-01-11T17:06:15Z

Here is a working implementation of floating point support for Cortex-R. It does lazy context switching. It is based on v2.3.0. There are some conflicts with HEAD, but it will be a while before I can get around to looking at those. I'm putting this out there in case others need a starting point for FPU support before I can get this merge worthy.

https://github.com/bbolen/zephyr/commits/cortex_r_fpu

legath · 2021-08-09T19:42:21Z

Some of R4F socs have double precision. For example quote from TI Hercules brochure
Floating Point Unit (FPU)
• FPU is compliant to IEEE754
• 16 double-word (64 bits) registers
• 32 single-word (32 bits) registers
• Supports features: – Single-precision and double-precision add, subtract, multiply, divide, multiply and accumulate,
and square root operations – Conversions between fixed-point and floating-point data formats, etc – Comparisons – Underflow – Exceptions

shrhrw · 2021-09-13T16:35:46Z

Hi @bbolen , I am working on supporting a cortex-r5f chip in zephyr, and I've spliced in your code from this post: #19979 (comment)

I have it building, however when trying to flash it onto the board I am encountering the following error:
Debug: 387 144 cortex_a.c:301 cortex_a_exec_opcode(): exec opcode 0xee000e15
Debug: 388 145 armv4_5.c:496 arm_set_cpsr(): set CPSR 0x000003db: Undefined instruction mode, ARM state

When looking at the ARM documentation here: https://developer.arm.com/documentation/ddi0406/b/System-Level-Architecture/The-System-Level-Programmers--Model/Exceptions/Undefined-Instruction-exception?lang=en

I found this section:
The Undefined Instruction exception can be used for:

software emulation of a coprocessor in a system that does not have the physical coprocessor hardware
lazy context switching of coprocessor registers
general-purpose instruction set extension by software emulation
signaling an illegal instruction execution
division by zero errors.

Do you know if my error is a coincidence, or related in the way described? If so, do you have a suggestion?

bbolen · 2021-09-15T00:46:33Z

It could be related. The FPU is usually disabled. When the code gets to a floating point instruction, an undefined instruction happens, the FPU gets enabled, and execution starts again on the floating point instruction that caused the fault. So one undefined instruction exception would be expected when using floating point, but it wouldn't crash anything.

I'm unavailable for the rest of the week, but I'll look closer at your details above on Monday.

shrhrw · 2021-09-15T18:42:13Z

`Open On-Chip Debugger 0.11.0+dev-00242-g7036ed509-dirty (2021-08-03-17:04)
Licensed under GNU GPL v2
For bug reports, read
http://openocd.org/doc/doxygen/bugs.html
Info : TI BE-32 quirks mode is enabled
Info : XDS110: connected
Info : XDS110: vid/pid = 0451/bef3
Info : XDS110: firmware version = 3.0.0.16
Info : XDS110: hardware version = 0x0029
Info : XDS110: connected to target via JTAG
Info : XDS110: TCK set to 2500 kHz
Info : clock speed 1500 kHz
Info : JTAG tap: tms570.jrc tap/device found: 0x1b95a02f (mfg: 0x017 (Texas Instruments), part: 0xb95a, ver: 0x1)
Info : JTAG tap: tms570.cpu enabled
Info : tms570.cpu: hardware has 8 breakpoints, 8 watchpoints
Info : starting gdb server for tms570.cpu on 3333
Info : Listening on port 3333 for gdb connections
TargetName Type Endian TapName State

0* tms570.cpu cortex_r4 big tms570.cpu running

Info : JTAG tap: tms570.jrc tap/device found: 0x1b95a02f (mfg: 0x017 (Texas Instruments), part: 0xb95a, ver: 0x1)
Info : JTAG tap: tms570.cpu enabled
Warn : tms570.cpu: ran after reset and before halt ...
Info : tms570.cpu: MPIDR level2 0, cluster 0, core 0, mono core, no SMT
target halted in ARM state due to debug-request, current mode: Undefined instruction
cpsr: 0x000003db pc: 0x00000004
D-Cache: disabled, I-Cache: disabled
flash
flash bank bank_id driver_name base_address size_bytes chip_width_bytes
bus_width_bytes target [driver_options ...]
flash banks
flash init
flash list
gdb_flash_program ('enable'|'disable')
nand
program [address] [pre-verify] [verify] [reset] [exit]

Info : XDS110: disconnected
FATAL ERROR: command exited with status 1`

-- Application: /home/smith/zephyrproject/zephyr/samples/hello_world
-- Zephyr version: 2.7.0-rc1 (/home/smith/zephyrproject/zephyr), build: v1.12.0-34809-g29387287d9f7
-- Found Python3: /usr/bin/python3.8 (found suitable exact version "3.8.10") found components: Interpreter
-- Found west (found suitable version "0.11.1", minimum required is "0.7.1")
-- Board: hercules_tms570lc43x
-- Cache files will be written to: /home/smith/.cache/zephyr
-- Using toolchain: zephyr 0.13.0 (/home/smith/zephyr-sdk-0.13.0)
-- Open On-Chip Debugger 0.11.0+dev-00358-g6c1e1a212-dirty (2021-08-26-13:54)

My local zephyr repo was cloned from your repo here: https://github.com/bbolen/zephyr/commits/cortex_r_fpu and I updated it to the latest version.

stephanosio added Feature Request A request for a new feature area: Kernel area: ARM ARM (32-bit) Architecture labels Oct 21, 2019

stephanosio self-assigned this Oct 21, 2019

stephanosio mentioned this issue Oct 22, 2019

ARM Cortex A Architecture support - ARMv8-A #11172

Closed

stephanosio mentioned this issue Nov 12, 2019

Cortex-R Improvement #20594

Closed

25 tasks

carlescufi added Feature A planned feature with a milestone and removed Feature Request A request for a new feature labels Mar 24, 2020

stephanosio mentioned this issue May 3, 2020

arch: arm: aarch32: When CPU_HAS_FPU for Cortex-R5 is selected, prep_c.c uses undefined symbols #24911

Closed

stephanosio added this to the v2.4.0 milestone May 11, 2020

stephanosio removed this from the v2.4.0 milestone Jun 11, 2021

stephanosio assigned stephanosio and bbolen and unassigned stephanosio Aug 13, 2021

stephanosio mentioned this issue Apr 12, 2022

Cortex-R Floating Point Support #44753

Merged

stephanosio closed this as completed in #44753 May 5, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement Cortex-R floating-point support #19979

Implement Cortex-R floating-point support #19979

stephanosio commented Oct 21, 2019 •

edited

Loading

bbolen commented Jul 22, 2020

stephanosio commented Jul 23, 2020 •

edited

Loading

bbolen commented Jul 27, 2020

bbolen commented Oct 12, 2020

bbolen commented Jan 11, 2021

legath commented Aug 9, 2021

shrhrw commented Sep 13, 2021 •

edited

Loading

bbolen commented Sep 15, 2021

shrhrw commented Sep 15, 2021 •

edited

Loading

Implement Cortex-R floating-point support #19979

Implement Cortex-R floating-point support #19979

Comments

stephanosio commented Oct 21, 2019 • edited Loading

Overview

Specifications

Note

bbolen commented Jul 22, 2020

stephanosio commented Jul 23, 2020 • edited Loading

bbolen commented Jul 27, 2020

bbolen commented Oct 12, 2020

bbolen commented Jan 11, 2021

legath commented Aug 9, 2021

shrhrw commented Sep 13, 2021 • edited Loading

bbolen commented Sep 15, 2021

shrhrw commented Sep 15, 2021 • edited Loading

stephanosio commented Oct 21, 2019 •

edited

Loading

stephanosio commented Jul 23, 2020 •

edited

Loading

shrhrw commented Sep 13, 2021 •

edited

Loading

shrhrw commented Sep 15, 2021 •

edited

Loading