Skip to content

Implement Cortex-R floating-point support #19979

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
stephanosio opened this issue Oct 21, 2019 · 9 comments · Fixed by #44753
Closed

Implement Cortex-R floating-point support #19979

stephanosio opened this issue Oct 21, 2019 · 9 comments · Fixed by #44753
Assignees
Labels
area: ARM ARM (32-bit) Architecture area: Kernel Feature A planned feature with a milestone

Comments

@stephanosio
Copy link
Member

stephanosio commented Oct 21, 2019

At the time of writing, ARM Cortex-R port does not support the use of hardware floating-point unit (VFP and NEON).

Considering common application scenarios for Cortex-R (real-time processing), it is imperative that hardware floating-point unit support is available for it; otherwise, practical usability of the Cortex-R port becomes questionable.

Overview

An overview of the hardware floating-point unit for Cortex-R is as follows:

  • Cortex-R4 and Cortex-R5
    • VFPv3-D16 may be optionally instantiated.
    • 16 double-precision registers are available (instead of 32 in full VFPv3).
    • Single-precision and double-precision operations are supported.
    • Half-precision operations are not supported.
    • Vector operations are not supported (software emulation is possible).
    • R5 may also optionally instantiate single-precision only VFPv3-D16.
  • Cortex-R7 and Cortex-R8
    • VFPv3-D16-FP16 may be optionally instantiated.
    • Two variants of VFPv3-D16-FP16 are available: Optimised and Full.
    • Optimised variant implements only single-precision and half-precision operations.
    • Full variant implements single-precision, half-precision and double-precision operations.
    • Vector operations are not supported (software emulation is possible).
  • Cortex-R52
    • VFPv3-FP16 or VFPv3-D16-FP16 must be instantiated.
    • Two variants are available: SP-only and Full Advanced SIMD.
    • SP-only variant
      • VFPv3-D16 (i.e. with 16 double-precision registers) is instantiated.
      • Half-precision and single-precision operations are supported.
    • Full Advanced SIMD variant
      • VFPv3 (i.e. with 32 double-precision registers) is instantiated.
      • NEON is instantiated.
      • Half-precision, single-precision, double-precision and vector operations are supported (with some optional subset specified by MVFR).

Specifications

ARM Cortex-R floating-point support feature shall:

  1. support Unshared FP registers mode and Shared FP registers mode.

    1. In both modes, the kernel shall initialise the FPU. In addition, all FP registers shall be initialised if DCLS (dual-redundant core lock-step) is configured.
    2. In Unshared FP registers mode, the kernel shall configure the FPU and leave it in operational state after booting.
    3. In Shared FP registers mode, all threads shall assume "FP disabled" state by default.
  2. optionally support emulation of the VFP instructions that are unimplemented by hardware.

ARM Cortex-R floating-point support feature, for Shared FP registers mode, shall:

  1. manage FP enable status at thread level, in conformance with the kernel FP interface.

    1. K_FP_REGS option shall be used to specify whether thread-wide floating-point support is enabled.
    2. K_FP_REGS option may be (re-)enabled only for the threads that were initially created with the same option.
  2. disable FPU after a context switch and re-enable it upon exception.

    1. After a context switching occurs, FPU shall be disabled by setting FPEXC.EN=0.
    2. FPU shall be re-enabled after the first FP instruction that caused undefined instruction exception if and only if K_FP_REGS option is set for the thread.
    3. The first FP instruction that caused an undefined instruction exception shall be re-executed after setting FPEXC.EN=1.
  3. store s0-s15 and FPSCR in exception stack frame and s16-s31 in thread context.

    1. s0-s15 and FPSCR shall be stored in the exception stack frame in order to allow the optional use of FP registers inside a nested interrupt handler.
    2. s16-s31 shall be preserved only for a thread context switch (and not for an exception entry).
  4. implement lazy stacking of FP context.

    1. All context switching-capable exception handler routines shall save only the basic stack frame upon entry.
    2. s0-s15 shall be stored to the exception stack frame during a context switch if FPU is enabled; in which case, the thread being switched out shall be marked to indicate that FP context is saved.
    3. s0-s15 shall be restored if the thread being switched in is marked to contain an FP context.
  5. preserve s16-s31 during context switch only when FPU is enabled.

    1. If FPU is enabled at the time of context switching, at least one FP instruction had been executed after the previous context switch and the FP context must preserved.
    2. FP context preservation of s16-s31 shall be implied by that of s0-s15.

Note

  • This feature will also be applicable to Cortex-A port when it is added in the future, as Cortex-R architecture is very similar to Cortex-A.
  • Specification 3-ii should really apply to Cortex-M as well.
@stephanosio stephanosio added Feature Request A request for a new feature area: Kernel area: ARM ARM (32-bit) Architecture labels Oct 21, 2019
@stephanosio stephanosio self-assigned this Oct 21, 2019
@stephanosio stephanosio mentioned this issue Nov 12, 2019
25 tasks
@carlescufi carlescufi added Feature A planned feature with a milestone and removed Feature Request A request for a new feature labels Mar 24, 2020
@stephanosio stephanosio added this to the v2.4.0 milestone May 11, 2020
@bbolen
Copy link
Collaborator

bbolen commented Jul 22, 2020

@stephanosio I found your https://github.com/stephanosio/zephyr/commits/aarch32_non_m_fp_alt branch and pulled those changes. I wanted to see if you have done any more work on this issue, specifically have you implemented anything mentioned in the "Specifications 3-7" section above?

@stephanosio
Copy link
Member Author

stephanosio commented Jul 23, 2020

@stephanosio I found your https://github.com/stephanosio/zephyr/commits/aarch32_non_m_fp_alt branch and pulled those changes. I wanted to see if you have done any more work on this issue, specifically have you implemented anything mentioned in the "Specifications 3-7" section above?

@bbolen There is a Cortex-A (and -R) FP sharing implementation that sort of works here:
https://github.com/ibirnbaum/zephyr/blob/armv7_cortex_a/arch/arm/core/aarch32/swap_helper.S

@bbolen
Copy link
Collaborator

bbolen commented Jul 27, 2020

@stephanosio thank you

@bbolen
Copy link
Collaborator

bbolen commented Oct 12, 2020

Can you elaborate on the description above with respect to items 5 and 6? I'm struggling to understand why the vfp registers would be saved on the exception stack and not the thread context in the normal case. I can see needing to temporarily put it on the exception stack in order for the exception handler to use the VFP unit, but I would assume they would be popped off that stack and pushed onto the thread context during the context switch.

@bbolen
Copy link
Collaborator

bbolen commented Jan 11, 2021

Here is a working implementation of floating point support for Cortex-R. It does lazy context switching. It is based on v2.3.0. There are some conflicts with HEAD, but it will be a while before I can get around to looking at those. I'm putting this out there in case others need a starting point for FPU support before I can get this merge worthy.

https://github.com/bbolen/zephyr/commits/cortex_r_fpu

@stephanosio stephanosio removed this from the v2.4.0 milestone Jun 11, 2021
@legath
Copy link

legath commented Aug 9, 2021

Some of R4F socs have double precision. For example quote from TI Hercules brochure
Floating Point Unit (FPU)
• FPU is compliant to IEEE754
• 16 double-word (64 bits) registers
• 32 single-word (32 bits) registers
• Supports features: – Single-precision and double-precision add, subtract, multiply, divide, multiply and accumulate,
and square root operations – Conversions between fixed-point and floating-point data formats, etc – Comparisons – Underflow – Exceptions

@stephanosio stephanosio assigned stephanosio and bbolen and unassigned stephanosio Aug 13, 2021
@shrhrw
Copy link

shrhrw commented Sep 13, 2021

Hi @bbolen , I am working on supporting a cortex-r5f chip in zephyr, and I've spliced in your code from this post: #19979 (comment)

I have it building, however when trying to flash it onto the board I am encountering the following error:
Debug: 387 144 cortex_a.c:301 cortex_a_exec_opcode(): exec opcode 0xee000e15
Debug: 388 145 armv4_5.c:496 arm_set_cpsr(): set CPSR 0x000003db: Undefined instruction mode, ARM state

When looking at the ARM documentation here: https://developer.arm.com/documentation/ddi0406/b/System-Level-Architecture/The-System-Level-Programmers--Model/Exceptions/Undefined-Instruction-exception?lang=en

I found this section:
The Undefined Instruction exception can be used for:

software emulation of a coprocessor in a system that does not have the physical coprocessor hardware
lazy context switching of coprocessor registers
general-purpose instruction set extension by software emulation
signaling an illegal instruction execution
division by zero errors.

Do you know if my error is a coincidence, or related in the way described? If so, do you have a suggestion?

@bbolen
Copy link
Collaborator

bbolen commented Sep 15, 2021

It could be related. The FPU is usually disabled. When the code gets to a floating point instruction, an undefined instruction happens, the FPU gets enabled, and execution starts again on the floating point instruction that caused the fault. So one undefined instruction exception would be expected when using floating point, but it wouldn't crash anything.

I'm unavailable for the rest of the week, but I'll look closer at your details above on Monday.

@shrhrw
Copy link

shrhrw commented Sep 15, 2021

`Open On-Chip Debugger 0.11.0+dev-00242-g7036ed509-dirty (2021-08-03-17:04)
Licensed under GNU GPL v2
For bug reports, read
http://openocd.org/doc/doxygen/bugs.html
Info : TI BE-32 quirks mode is enabled
Info : XDS110: connected
Info : XDS110: vid/pid = 0451/bef3
Info : XDS110: firmware version = 3.0.0.16
Info : XDS110: hardware version = 0x0029
Info : XDS110: connected to target via JTAG
Info : XDS110: TCK set to 2500 kHz
Info : clock speed 1500 kHz
Info : JTAG tap: tms570.jrc tap/device found: 0x1b95a02f (mfg: 0x017 (Texas Instruments), part: 0xb95a, ver: 0x1)
Info : JTAG tap: tms570.cpu enabled
Info : tms570.cpu: hardware has 8 breakpoints, 8 watchpoints
Info : starting gdb server for tms570.cpu on 3333
Info : Listening on port 3333 for gdb connections
TargetName Type Endian TapName State


0* tms570.cpu cortex_r4 big tms570.cpu running

Info : JTAG tap: tms570.jrc tap/device found: 0x1b95a02f (mfg: 0x017 (Texas Instruments), part: 0xb95a, ver: 0x1)
Info : JTAG tap: tms570.cpu enabled
Warn : tms570.cpu: ran after reset and before halt ...
Info : tms570.cpu: MPIDR level2 0, cluster 0, core 0, mono core, no SMT
target halted in ARM state due to debug-request, current mode: Undefined instruction
cpsr: 0x000003db pc: 0x00000004
D-Cache: disabled, I-Cache: disabled
flash
flash bank bank_id driver_name base_address size_bytes chip_width_bytes
bus_width_bytes target [driver_options ...]
flash banks
flash init
flash list
gdb_flash_program ('enable'|'disable')
nand
program [address] [pre-verify] [verify] [reset] [exit]

Info : XDS110: disconnected
FATAL ERROR: command exited with status 1`

-- Application: /home/smith/zephyrproject/zephyr/samples/hello_world
-- Zephyr version: 2.7.0-rc1 (/home/smith/zephyrproject/zephyr), build: v1.12.0-34809-g29387287d9f7
-- Found Python3: /usr/bin/python3.8 (found suitable exact version "3.8.10") found components: Interpreter
-- Found west (found suitable version "0.11.1", minimum required is "0.7.1")
-- Board: hercules_tms570lc43x
-- Cache files will be written to: /home/smith/.cache/zephyr
-- Using toolchain: zephyr 0.13.0 (/home/smith/zephyr-sdk-0.13.0)
-- Open On-Chip Debugger 0.11.0+dev-00358-g6c1e1a212-dirty (2021-08-26-13:54)

My local zephyr repo was cloned from your repo here: https://github.com/bbolen/zephyr/commits/cortex_r_fpu and I updated it to the latest version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: ARM ARM (32-bit) Architecture area: Kernel Feature A planned feature with a milestone
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants