Skip to content

arch: arm: Rewrite Cortex-R reset vector function. #20473

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

stephanosio
Copy link
Member

@stephanosio stephanosio commented Nov 8, 2019

This commit addresses the following issues:

  1. Add a new Kconfig configuration for specifying Dual-redundant Core
    Lock-step (DCLS) processor topology.

  2. Register initialisation is only required when Dual-redundant Core
    Lock-step (DCLS) is implemented in hardware. This initialisation is
    required on DCLS only because the architectural registers are in an
    indeterminate state after reset and therefore the initial register
    state of the two parallel executing cores are not guaranteed to be
    identical, which can lead to DCCM detecting it as a hardware fault.
    A conditional compilation check for this hardware configuration
    using the newly added CONFIG_CPU_HAS_DCLS flag has been added.

  3. The existing CPU register initialisation code did not take into
    account the banked registers for every execution mode. The new
    implementation ensures that all architectural registers of every
    mode are initialised.

  4. Add VFP register initialisation for when floating-point support is
    enabled and the core is configured in DCLS topology. This
    initialisation sequence is required for the same reason given in
    the first issue.

  5. Add provision for platform-specific initialisation on Cortex-R
    using PLATFORM_SPECIFIC_INIT config and z_platform_init function.

  6. Remove seemingly pointless and inadequately defined STACK_MARGIN.
    Not only does it violate the 8-byte stack alignment rule, it does
    not provide any form of real stack protection.

Signed-off-by: Stephanos Ioannidis [email protected]

NOTE:

  • Separated out from Cortex-R Port Improvement for 2.2 #19698 for timely review and merging.
  • This should be considered a bug fix, as most commercially available Cortex-R processors implement lock-step and the current reset code will fail to boot such processors.

@stephanosio
Copy link
Member Author

stephanosio commented Nov 8, 2019

This PR is related to the PR #20469, and should be merged after them.

@stephanosio
Copy link
Member Author

Force pushing to integrate the change in the PR #20470 into the commit in this PR in order to prevent unintentional nuking of CPU_HAS_DCLS.

@zephyrbot zephyrbot added area: ARM ARM (32-bit) Architecture area: API Changes to public APIs labels Nov 8, 2019
@ioannisg ioannisg added the bug The issue is a bug, or the PR is fixing a bug label Nov 8, 2019
@stephanosio stephanosio added this to the v2.2.0 milestone Dec 9, 2019
@ioannisg
Copy link
Member

ioannisg commented Jan 2, 2020

Needs rebase after merging the arch/arm/core code into the aarch32 sub-directory

@stephanosio stephanosio force-pushed the cortex_r_fix_reset_vector branch from 6e32494 to 6fa8eac Compare January 2, 2020 09:28
@stephanosio
Copy link
Member Author

Rebased

@stephanosio
Copy link
Member Author

Integrated #20478 into this PR.

Copy link
Member

@ioannisg ioannisg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks ok to me

* Ran when the system comes out of reset. The processor is in thread mode with
* privileged level. At this point, the main stack pointer (MSP) is already
* pointing to a valid area in SRAM.
* Ran when the system comes out of reset. The processor is in Supervisor mode
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's actually accurate - there is no execution mode named "thread" in Cortex-R :)

mcr p15, 0, r0, c1, c0, 0
SECTION_SUBSEC_FUNC(TEXT, _reset_section, __start)

#if defined(CONFIG_CPU_HAS_DCLS)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a change from default behavior, i.e. when DCLS is not set. Could you comment why the clearing of common registers is not required in that case? Especially since you state that the processor architectural registers are in indeterminate state..

Copy link
Member Author

@stephanosio stephanosio Jan 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you comment why the clearing of common registers is not required in that case?

@ioannisg "Indeterminate register reset value" in itself is not really a problem from the program code perspective because all architectural registers are going to be, at some point, set to a known value before being used. Note that Cortex-A behaves the same way yet we do not initialise registers in the boot code because it is simply not necessary to do so.

The register initialisation sequence is necessary if and only if the Dual-redundant Core Lock-step topology is implemented, because the register values of the two redundant cores are not guaranteed to be identical due to the indeterminate nature of the register reset state, and this can cause the lock-step logic to detect it as a fault (think of the "initialisation" as "synchronisation"/"convergence" of the register states of the dual redundant cores).

    /*
     * Initialise CPU registers to a defined state if the processor is
     * configured as Dual-redundant Core Lock-step (DCLS). This is required
     * for state convergence of the two parallel executing cores.
     */

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I understood that; wanted to make sure you were aware of this update in the dflt behavior

@@ -414,6 +414,12 @@ config CPU_HAS_TEE
Execution Environment (e.g. when it has a security attribution
unit).

config CPU_HAS_DCLS
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stephanosio one more thing; i wonder if this is as-generic-as-possible.
AFAIK there can be multi-level redundant lockstep configuration, e..g triple modular, with error correction with majority vote. So I wonder if we could use a generic Kconfig name for lockstep here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ioannisg Initially, I thought of adding something like CPU_HAS_LOCKSTEP, but decided to go with CPU_HAS_DCLS for now because "TCLS" or higher is simply unheard of and its practicality is questionable for general safety critical applications (i.e. automotive and industrial).

It is worth noting that, at the time of writing, there is not even a single commercially available solution that implements triple redundancy at core level (except the space-grade Cortex-R5 TCLS radiation-tolerant hack which is still WIP and only uses TCLS for RTL-level radiation tolerance instead of full fledged radiation hardening in PD), and it is almost always preferable to have multiple physically separate DCLS systems working in fail-over or majority voting configuration, as physical system component failure is a dominant factor when it comes to extreme reliability.

All that aside, DCLS and TCLS (or MCLS) handling can be quite different and I believe they should be specified using different symbols. For instance, the ARM DCLS requires software-level fault handling and core re-synchronisation for error recovery, whereas most of the error recovery process is handled in the hardware block for the TCLS (the space-grade implementation).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation here @stephanosio
So I am wondering; should we have this, for now, as an ARM-aRCH-only option?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I am wondering; should we have this, for now, as an ARM-aRCH-only option?

@ioannisg As far as I can see, there are quite a few archs that are currently supported by the Zephyr that can potentially make use of this symbol:

  1. ARC: https://www.synopsys.com/dw/ipdir.php?ds=arc-emfs
  2. NIOS II: https://nmi.org.uk/wp-content/uploads/2016/05/Altera_NMI-Func-Safety.pdf
  3. RISC-V (WIP): https://riscv.org/wp-content/uploads/2017/05/Wed0900am-RISC-V-Lockstep-Processor-Odiga.pdf

@stephanosio stephanosio closed this Jan 2, 2020
@stephanosio stephanosio reopened this Jan 2, 2020
Copy link
Collaborator

@galak galak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment about what the inline asm in soc/arm/xilinx_zynqmp/soc.c z_platform_init is doing.

This commit addresses the following issues:

1. Add a new Kconfig configuration for specifying Dual-redundant Core
   Lock-step (DCLS) processor topology.

2. Register initialisation is only required when Dual-redundant Core
   Lock-step (DCLS) is implemented in hardware. This initialisation is
   required on DCLS only because the architectural registers are in an
   indeterminate state after reset and therefore the initial register
   state of the two parallel executing cores are not guaranteed to be
   identical, which can lead to DCCM detecting it as a hardware fault.
   A conditional compilation check for this hardware configuration
   using the newly added CONFIG_CPU_HAS_DCLS flag has been added.

3. The existing CPU register initialisation code did not take into
   account the banked registers for every execution mode. The new
   implementation ensures that all architectural registers of every
   mode are initialised.

4. Add VFP register initialisation for when floating-point support is
   enabled and the core is configured in DCLS topology. This
   initialisation sequence is required for the same reason given in
   the first issue.

5. Add provision for platform-specific initialisation on Cortex-R
   using PLATFORM_SPECIFIC_INIT config and z_platform_init function.

6. Remove seemingly pointless and inadequately defined STACK_MARGIN.
   Not only does it violate the 8-byte stack alignment rule, it does
   not provide any form of real stack protection.

Signed-off-by: Stephanos Ioannidis <[email protected]>
This commit relocates the exception vector table address range
configuration routine that was previously implemented as part of
Cortex-R architecture reset function to SoC platform-specific
initialisation routine.

Signed-off-by: Stephanos Ioannidis <[email protected]>
@stephanosio stephanosio force-pushed the cortex_r_fix_reset_vector branch from 6d0d7ee to c0d4e45 Compare January 9, 2020 17:19
@stephanosio stephanosio requested a review from galak January 9, 2020 17:20
@ioannisg ioannisg merged commit d314253 into zephyrproject-rtos:master Jan 10, 2020
@stephanosio stephanosio deleted the cortex_r_fix_reset_vector branch April 23, 2020 06:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: API Changes to public APIs area: ARM ARM (32-bit) Architecture bug The issue is a bug, or the PR is fixing a bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants