-
Notifications
You must be signed in to change notification settings - Fork 7.3k
Arch arm fix is in isr #19688
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Arch arm fix is in isr #19688
Conversation
@andrewboie please, take a look at the changes in z_fatal_error (kernel/fatal.c). This is a proposal of mine; feel free to comment/reject. |
All checks are passing now. checkpatch (informational only, not a failure)
Tip: The bot edits this comment instead of posting a new one, so you can check the comment's history to see earlier messages. |
f5128af
to
4676679
Compare
96c7e56
to
c43c05e
Compare
c43c05e
to
919988f
Compare
dfda5f7
to
7e146f9
Compare
kernel/fatal.c
Outdated
* during ISR exit, but in this case the thread should be | ||
* aborted. | ||
*/ | ||
if (z_arch_is_in_nested_exception(esf)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andrewboie this might actually be not ideal, so I'd like your thoughts here.
I think, this works in a deterministic way, because MPU-based stack overflow detection is supported only for threads, so there's no way we end up (here) with both K_ERR_STACK_CHK_FAIL and a nested exception
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make it more deterministic, we could have two error codes for stack corruption, one for sentinel check and one for HW-based check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if we get this far we should unconditionally k_panic()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's OK with me.
7e146f9
to
16190b5
Compare
16190b5
to
c8ddeb8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, these changes make sense.
I still need to test this code on some HW, this is why i marked it as request changes.
Looking forward, @agansari |
c8ddeb8
to
6ca7986
Compare
#endif | ||
z_arm_fatal_error(reason, esf); | ||
/* Copy ESF */ | ||
memcpy(&esf_copy, esf, sizeof(z_arch_esf_t)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to make a copy of the ESF?
AFAIK every other platform just passes the original stack frame along.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main reason is that kernel/fatal.c needs to query, whether this a nested exception. And it needs to do it by inspecting the ESF. So the ESF must be holding correct information. The problem with the ARM ESF is that it may be, originally, corrupted due to e.g. stack overflow.
The only GENERIC way for ARM Software to detect if it is in a nested exception, is to inspect the EXC_RETURN value that is placed in LR upon exception entry. But EXC_RETURN value is neither part of ESF, nor stored in fault.c (as a global state variable), nor is it passed to kernel/fatal.c as argument.
So, unless we want to change the internal API of z_fatal_error(), for instance, by adding a "flags" parameter, we need the sole parameter, *esf, to hold all the required info. That's why I do the copy here.
if we define z_fatal_error(*esf, some_generic_flags) {
}, we should be able to pass the EXC_RETURN value or the nested_exception boolean value, but all these require cross-arch changes.
arch/arm/include/cortex_m/exc.h
Outdated
return (esf->basic.xpsr & IPSR_ISR_Msk) ? (true) : (false); | ||
} | ||
|
||
#define z_arch_is_in_nested_exception z_arm_is_in_nested_exception |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't do this. it's impossible to prototype properly.
define the inline function as z_arch_is_in_nested_exception() if we're going to make this part of the arch interface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine, I will change this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed now
kernel/fatal.c
Outdated
@@ -117,5 +119,30 @@ void z_fatal_error(unsigned int reason, const z_arch_esf_t *esf) | |||
__ASSERT(!k_is_in_isr(), | |||
"Attempted to recover from a fatal error in ISR"); | |||
} | |||
|
|||
#if defined(z_arch_is_in_nested_exception) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this isn't the best way to do this.
we need to put a proper prototype in include/sys/arch_interface.h.
And either provide an implementation for every arch, or file a GH issue and introduce some short-lived Kconfig CONFIG_ARCH_HAS_IS_IN_NESTED_EXCEPTION or something like that until we can get parity on all arches.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine, with me. Since your work is now merged, i can introduce this short-lived Kconfig option, and select it for ARM architecture, and then add the prototype in arch_interface.h
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done this way.
kernel/fatal.c
Outdated
* aborted. | ||
*/ | ||
if (z_arch_is_in_nested_exception(esf)) { | ||
LOG_ERR("Fault during interrupt handling\n"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This printout needs to be moved up, before k_sys_fatal_error_handler() is called
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
kernel/fatal.c
Outdated
* during ISR exit, but in this case the thread should be | ||
* aborted. | ||
*/ | ||
if (z_arch_is_in_nested_exception(esf)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if we get this far we should unconditionally k_panic()
kernel/fatal.c
Outdated
@@ -117,5 +119,30 @@ void z_fatal_error(unsigned int reason, const z_arch_esf_t *esf) | |||
__ASSERT(!k_is_in_isr(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update this assertion
39a6da6
to
4f77635
Compare
4f77635
to
d1865cf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andrewboie I did a second try with the z_fatal_error behavior; please, re-review
We add a useful inline comment in the SVC handler (written in assembly), which identifies one of the function return points a bit more clearly. Signed-off-by: Ioannis Glaropoulos <[email protected]>
Add some documentation for ARM-specific function z_do_kernel_oops, stating clearly that it is only invoked inside SVC context. We also comment on the validity of the supplied ESF. Signed-off-by: Ioannis Glaropoulos <[email protected]>
This commit refactors and cleans up __fault, so the function - reduces to supplying MSP, PSP, and EXC_RETURN to the C function for fault handling - simplifies itself, removing conditional implementation, i.e. based on ARM Secure firmware, The reason for that is simple: it is much better to write the fault handling in C instead of assembly, so we really do only what is strictly required, in assembly. Therefore, the commit refactors the z_arm_fault() function as well, organizing better the different functional blocks, that is: - unlocking interrupts - retriving ESF - asserting for HW errors - printing additional error logs The refactoring unifies the way the ESF is retrieved for the different Cortex-M variants and security execution states. Signed-off-by: Ioannis Glaropoulos <[email protected]>
We re-implement the z_arch_is_in_isr function so it aligns with the implementation for other ARCHEs, i.e. returning false whenever any IRQ or system exception is active. Signed-off-by: Ioannis Glaropoulos <[email protected]>
We introduce a Kconfig option to signify whether an Architecture has the capability of detecting whether execution is, currently, in a nested exception. Signed-off-by: Ioannis Glaropoulos <[email protected]>
We add an ARM internal API which allows the kernel to infer the execution mode we are going to return after the current exception. Signed-off-by: Ioannis Glaropoulos <[email protected]>
In z_fatal_error() we invoke the arch-specific API that evaluates whether we are in a nested exception. We then use the result to log a message that the error occurred in ISR. In non-test mode, we unconditionally panic, if an exception has occurred in an ISR and the fatal error handler has not returned (apart from the case of an error in stack sentinel check). Signed-off-by: Ioannis Glaropoulos <[email protected]>
For ARM, Z_ARCH_EXCEPT triggers an SVC to induce a system error. This code block may be inlined, so, if we want to return from this error DIRECTLY to thread mode, e.g. if the system error occurred in ISR context and we are not aborting the current thread, we must instruct the compiler that the execution may continue after the inlined SVC. Therefore, we must remove the CODE_UNREACHABLE statements. Signed-off-by: Ioannis Glaropoulos <[email protected]>
This commit adds a new test suite in tests/arch/arm that intends to testing system faults inside interrupts. Signed-off-by: Ioannis Glaropoulos <[email protected]>
d1865cf
to
f6bd594
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Took some time to re-review this patch.
Tested kernel and arch/arm on an M7:
andrei@vboc:~/zephyr$ ./scripts/sanitycheck -p mimxrt1050_evk -T tests/kernel
Renaming output directory to /home/andrei/zephyr/sanity-out.1
JOBS: 4
Building initial testcase list...
84 test configurations selected, 7 configurations discarded due to filters.
Adding tasks to the queue...
total complete: 84/ 84 100% skipped: 7, failed: 0
77 of 84 tests passed (100.00%), 0 failed, 7 skipped with 0 warnings in 603.35 seconds
In total 91 test cases were executed on 1 out of total 208 platforms (0.48%)
andrei@vboc:~/zephyr$ ./scripts/sanitycheck -p mimxrt1050_evk -T tests/arch/arm/
Renaming output directory to /home/andrei/zephyr/sanity-out.2
JOBS: 4
Building initial testcase list...
10 test configurations selected, 0 configurations discarded due to filters.
Adding tasks to the queue...
total complete: 10/ 10 100% skipped: 0, failed: 0
10 of 10 tests passed (100.00%), 0 failed, 0 skipped with 0 warnings in 103.07 seconds
In total 10 test cases were executed on 1 out of total 208 platforms (0.48%)
Also tested this patch on a different debugging environment, no strange behavior observed
@andrewboie ping for re-visiting this review :) |
A multi-purpose PR which
z_do_kernel_oops
for ARM, properlyz_arch_is_in_isr
with other ARCHes, i.e. returing true for any ISR contextFixing #17656 for ARM Cortex-M