Skip to content

Cortex-R Floating Point Support #44753

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
May 5, 2022

Conversation

bbolen
Copy link
Collaborator

@bbolen bbolen commented Apr 11, 2022

This PR adds support for the Cortex-R VFP unit. It can also be used to enable floating point support for Cortex-A, though that is untested. A couple of the basic ideas behind this PR:

  1. Kconfig options are used to select the VFP options for the Cortex-R implemented in the SoC. FPU/FPU_SHARING are still used to determine whether or not to pass the hard or soft float flags to the compiler.
  2. The space for the floating point registers is always allocated on the process stack during context switches, but the registers are not saved unless necessary.

I pulled (and heavily modified) a couple of patches @stephanosio pointed me to a long time ago. Let me know if you want me to change the author so that git blame won't implicate you if something goes wrong.

Closes #19979

@github-actions github-actions bot added area: API Changes to public APIs area: ARM ARM (32-bit) Architecture area: Build System area: Kernel area: Tests Issues related to a particular existing or missing test labels Apr 11, 2022
peter-mitsis
peter-mitsis previously approved these changes Apr 12, 2022
Copy link
Member

@stephanosio stephanosio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks ok in general.

Just a few comments.

@@ -21,21 +21,21 @@ tests:
tags: fpu kernel
timeout: 600
kernel.fpu_sharing.generic.riscv32:
extra_args: PI_NUM_ITERATIONS=500
extra_args: PI_NUM_ITERATIONS=500 DISABLE_INT_TEST=1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the rationale behind disabling the timer testcase for these platforms?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those particular platforms fail the test. Unfortunately, I don't enough about these platforms to fix them myself. I updated the commit message to note this.

Copy link
Member

@stephanosio stephanosio May 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make 48b8439a0d4123d8abf83d08e170dbcc798e68cd a separate PR? This needs a more thorough review from the maintainers of all these platforms.

p.s. it also fails on ARC when testing with nSIM. succeeds with PI_NUM_ITERATIONS=500.
p.s. 2. riscv32 and riscv64 succeed also with PI_NUM_ITERATIONS=500 ... anyways, let's review this in a separate PR so as to not block this.

@bbolen bbolen force-pushed the cortex_r_fpu2 branch 2 times, most recently from ff42d22 to 20c0bbf Compare April 13, 2022 01:21
@povergoing
Copy link
Member

Just wonder, these are only for v7r vfp, right?

@stephanosio
Copy link
Member

stephanosio commented May 2, 2022

@julien-massot @povergoing FYI, this needs to be extended to AARCH32_ARMV8_R as well.

@microbuilder
Copy link
Member

I was planning to do that while I was working on the Cortex-R stuff, but never got to do it. We should definitely do it at some point.

Perhaps during the 3.2 cycle?

@@ -21,21 +21,21 @@ tests:
tags: fpu kernel
timeout: 600
kernel.fpu_sharing.generic.riscv32:
extra_args: PI_NUM_ITERATIONS=500
extra_args: PI_NUM_ITERATIONS=500 DISABLE_INT_TEST=1
Copy link
Member

@stephanosio stephanosio May 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make 48b8439a0d4123d8abf83d08e170dbcc798e68cd a separate PR? This needs a more thorough review from the maintainers of all these platforms.

p.s. it also fails on ARC when testing with nSIM. succeeds with PI_NUM_ITERATIONS=500.
p.s. 2. riscv32 and riscv64 succeed also with PI_NUM_ITERATIONS=500 ... anyways, let's review this in a separate PR so as to not block this.

@stephanosio stephanosio added the Release Notes To be mentioned in the release notes label May 2, 2022
Bradley Bolen and others added 11 commits May 2, 2022 10:35
Reuse the Cortex-M paths for testing the floating point unit.

Signed-off-by: Bradley Bolen <[email protected]>
For testing, assume that the Cortex-A/R platforms are using a GIC
interrupt controller.  Use the last GIC SGI to trigger an interrupt for
the test.

Signed-off-by: Bradley Bolen <[email protected]>
This commit adds the unified floating-point configuration symbols for
the ARM architectures.

These configuration symbols allow specification of the floating-point
coprocessors, such as VFP (also known as FP for Cortex-M) and NEON,
for the ARM architectures.

Signed-off-by: Stephanos Ioannidis <[email protected]>
This commit updates the Zephyr build system to support specifying
advanced floating-point compilation options derived from the newly
introduced unified floating-point configurations.

The following changes are introduced by this commit:

1. Specify architecture floating-point option to the `-mcpu` flag.
2. Specify floating-point unit (FPU) type using the `-mfpu` flag.

Note that the `-march` flag is not specified separately because the
`-mcpu` flag provides more detailed architecture options and this
makes the `-march` flag redundant.

Signed-off-by: Stephanos Ioannidis <[email protected]>
When Dual-redundant Core Lock-step (DCLS) topology is used, the VFP
registers across the two redundant cores must be manually initialised
and synchronised, and this requires the `-mfloat-abi=hard` option to
be specified.

This commit forces the use of FP "hard" ABI on the VFP-equipped cores
that are configured in DCLS topology.

Signed-off-by: Stephanos Ioannidis <[email protected]>
Cortex-A/R use a descending stack frame and the hardware does not help
with the stacking.  This led to some less than desirable workarounds in
the exception code where the basic stack frame was saved twice.
Rearranging the order of the exception stack frame removes that problem
and provides a clearer path to saving CPU context in a fully descending
manner.

Signed-off-by: Bradley Bolen <[email protected]>
Grouping the FPU registers together will make adding FPU support for
Cortex-A/R easier later.  It provides the ability to get the sizeof and
offsetof FPU registers easier.

Signed-off-by: Bradley Bolen <[email protected]>
This will enable the VFP unit on boot to handle the case where
FPU_SHARING is not enabled.

Signed-off-by: Bradley Bolen <[email protected]>
This commit updates the Cortex-R reset routine to initialise
(synchronise) the VFP D16-D31 registers when Dual-redundant Core
Lock-step (DCLS) is enabled.

Signed-off-by: Stephanos Ioannidis <[email protected]>
This adds lazy floating point context switching.  On svc/irq entrance,
the VFP is disabled and a pointer to the exception stack frame is saved
away.  If the esf pointer is still valid on exception exit, then no
other context used the VFP so the context is still valid and nothing
needs to be restored.  If the esf pointer is NULL on exception exit,
then some other context used the VFP and the floating point context is
restored from the esf.

The undefined instruction handler is responsible for saving away the
floating point context if needed.  If the handler is in the first
irq/svc context and the current thread uses the VFP, then the float
context needs to be saved.  Also, if the handler is in a nested context
and the previous context was using the FVP, save the float context.

Signed-off-by: Bradley Bolen <[email protected]>
This SoC supports vfpv3-d16 with single and double precision and 16
64-bit registers.

Signed-off-by: Bradley Bolen <[email protected]>
@bbolen
Copy link
Collaborator Author

bbolen commented May 2, 2022

@stephanosio I created #45286 to review the new interleaved FPU test separately and fixed up the additional issues you pointed out. Thanks.

@microbuilder
Copy link
Member

Approved too soon, there are some compliance check issues, but should be easy to address.

@stephanosio
Copy link
Member

there are some compliance check issues

This warning can be ignored since typedef usage here is justified.

-:1022: WARNING:NEW_TYPEDEFS: do not add new typedefs
#1022: FILE: arch/arm/include/kernel_arch_data.h:48:
+typedef struct __fpu_sf _fpu_sf_t;

@povergoing
Copy link
Member

@julien-massot @povergoing FYI, this needs to be extended to AARCH32_ARMV8_R as well.

Yes, agreed. @SgrrZhf May take a look

@stephanosio stephanosio merged commit dfc4c3f into zephyrproject-rtos:main May 5, 2022
@bbolen bbolen deleted the cortex_r_fpu2 branch May 6, 2022 12:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: API Changes to public APIs area: ARM ARM (32-bit) Architecture area: Build System area: Kernel area: Tests Issues related to a particular existing or missing test Release Notes To be mentioned in the release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement Cortex-R floating-point support
6 participants