Skip to content

Arch arm fix is in isr #19688

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Oct 24, 2019

Conversation

ioannisg
Copy link
Member

@ioannisg ioannisg commented Oct 8, 2019

A multi-purpose PR which

  • puts some order in the fault handling mechanism of ARM
  • documents z_do_kernel_oops for ARM, properly
  • removes assembly code from fault.S and moves it to C code, eventually removing conditional implementations for ARMv6 and ARMv7
  • aligns z_arch_is_in_isr with other ARCHes, i.e. returing true for any ISR context
  • implements the new internal API that evaluates whether an exception has occurred while in thread or handler mode
  • utilizes this API in kernel/fatal.c to decide whether to abort the current thread
  • implements an ARM-specific test for the above
  • fixes a bug in Z_ARCH_EXCEPT for ARM that was introducing by default a code-unreachable statement

Fixing #17656 for ARM Cortex-M

@ioannisg ioannisg requested a review from andrewboie October 8, 2019 12:52
@ioannisg
Copy link
Member Author

ioannisg commented Oct 8, 2019

@andrewboie please, take a look at the changes in z_fatal_error (kernel/fatal.c). This is a proposal of mine; feel free to comment/reject.
The main point of this PR is the k_is_in_isr() API support for ARM, the code clean-up and a bug fix in Z_ARCH_EXCEPT

@zephyrbot zephyrbot added area: ARM ARM (32-bit) Architecture area: API Changes to public APIs area: Tests Issues related to a particular existing or missing test area: Kernel labels Oct 8, 2019
@zephyrbot
Copy link
Collaborator

zephyrbot commented Oct 8, 2019

All checks are passing now.

checkpatch (informational only, not a failure)

-:641: WARNING:TABSTOP: Statements should start on a tabstop
#641: FILE: kernel/fatal.c:133:
+			 }

-:645: WARNING:SUSPECT_CODE_INDENT: suspect code indent for conditional statements (8, 24)
#645: FILE: kernel/fatal.c:137:
+	} else {
[...]
+			if (z_arch_is_in_nested_exception(esf)) {

-:649: WARNING:LONG_LINE_COMMENT: line over 80 characters
#649: FILE: kernel/fatal.c:141:
+				/* Abort the thread only on STACK Sentinel check fail. */

- total: 0 errors, 3 warnings, 827 lines checked

NOTE: For some of the reported defects, checkpatch may be able to
      mechanically convert to the typical style using --fix or --fix-inplace.

Your patch has style problems, please review.

NOTE: Ignored message types: AVOID_EXTERNS BRACES CONFIG_EXPERIMENTAL CONST_STRUCT DATE_TIME FILE_PATH_CHANGES MINMAX NETWORKING_BLOCK_COMMENT_STYLE PRINTK_WITHOUT_KERN_LEVEL SPLIT_STRING VOLATILE

NOTE: If any of the errors are false positives, please report
      them to the maintainers.

Tip: The bot edits this comment instead of posting a new one, so you can check the comment's history to see earlier messages.

@ioannisg ioannisg force-pushed the arch_arm_fix_is_in_isr branch 5 times, most recently from f5128af to 4676679 Compare October 8, 2019 20:38
@ioannisg ioannisg added the bug The issue is a bug, or the PR is fixing a bug label Oct 8, 2019
@ioannisg ioannisg marked this pull request as ready for review October 8, 2019 20:39
@ioannisg ioannisg added this to the v2.1.0 milestone Oct 8, 2019
@ioannisg ioannisg requested a review from wentongwu October 8, 2019 20:44
@ioannisg ioannisg force-pushed the arch_arm_fix_is_in_isr branch 2 times, most recently from 96c7e56 to c43c05e Compare October 9, 2019 06:37
@ioannisg ioannisg force-pushed the arch_arm_fix_is_in_isr branch from c43c05e to 919988f Compare October 9, 2019 07:19
@ioannisg ioannisg requested a review from agansari October 9, 2019 09:53
@ioannisg ioannisg force-pushed the arch_arm_fix_is_in_isr branch 3 times, most recently from dfda5f7 to 7e146f9 Compare October 9, 2019 19:48
kernel/fatal.c Outdated
* during ISR exit, but in this case the thread should be
* aborted.
*/
if (z_arch_is_in_nested_exception(esf)) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andrewboie this might actually be not ideal, so I'd like your thoughts here.
I think, this works in a deterministic way, because MPU-based stack overflow detection is supported only for threads, so there's no way we end up (here) with both K_ERR_STACK_CHK_FAIL and a nested exception

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make it more deterministic, we could have two error codes for stack corruption, one for sentinel check and one for HW-based check.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if we get this far we should unconditionally k_panic()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's OK with me.

@ioannisg ioannisg force-pushed the arch_arm_fix_is_in_isr branch from 7e146f9 to 16190b5 Compare October 10, 2019 07:38
@ioannisg ioannisg force-pushed the arch_arm_fix_is_in_isr branch from 16190b5 to c8ddeb8 Compare October 11, 2019 12:21
Copy link
Collaborator

@agansari agansari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, these changes make sense.

I still need to test this code on some HW, this is why i marked it as request changes.

@ioannisg
Copy link
Member Author

Looks good, these changes make sense.

I still need to test this code on some HW, this is why i marked it as request changes.

Looking forward, @agansari

@ioannisg ioannisg force-pushed the arch_arm_fix_is_in_isr branch from c8ddeb8 to 6ca7986 Compare October 13, 2019 23:10
#endif
z_arm_fatal_error(reason, esf);
/* Copy ESF */
memcpy(&esf_copy, esf, sizeof(z_arch_esf_t));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to make a copy of the ESF?
AFAIK every other platform just passes the original stack frame along.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main reason is that kernel/fatal.c needs to query, whether this a nested exception. And it needs to do it by inspecting the ESF. So the ESF must be holding correct information. The problem with the ARM ESF is that it may be, originally, corrupted due to e.g. stack overflow.

The only GENERIC way for ARM Software to detect if it is in a nested exception, is to inspect the EXC_RETURN value that is placed in LR upon exception entry. But EXC_RETURN value is neither part of ESF, nor stored in fault.c (as a global state variable), nor is it passed to kernel/fatal.c as argument.

So, unless we want to change the internal API of z_fatal_error(), for instance, by adding a "flags" parameter, we need the sole parameter, *esf, to hold all the required info. That's why I do the copy here.

if we define z_fatal_error(*esf, some_generic_flags) { }, we should be able to pass the EXC_RETURN value or the nested_exception boolean value, but all these require cross-arch changes.

return (esf->basic.xpsr & IPSR_ISR_Msk) ? (true) : (false);
}

#define z_arch_is_in_nested_exception z_arm_is_in_nested_exception
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't do this. it's impossible to prototype properly.
define the inline function as z_arch_is_in_nested_exception() if we're going to make this part of the arch interface.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine, I will change this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed now

kernel/fatal.c Outdated
@@ -117,5 +119,30 @@ void z_fatal_error(unsigned int reason, const z_arch_esf_t *esf)
__ASSERT(!k_is_in_isr(),
"Attempted to recover from a fatal error in ISR");
}

#if defined(z_arch_is_in_nested_exception)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this isn't the best way to do this.
we need to put a proper prototype in include/sys/arch_interface.h.
And either provide an implementation for every arch, or file a GH issue and introduce some short-lived Kconfig CONFIG_ARCH_HAS_IS_IN_NESTED_EXCEPTION or something like that until we can get parity on all arches.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine, with me. Since your work is now merged, i can introduce this short-lived Kconfig option, and select it for ARM architecture, and then add the prototype in arch_interface.h

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done this way.

kernel/fatal.c Outdated
* aborted.
*/
if (z_arch_is_in_nested_exception(esf)) {
LOG_ERR("Fault during interrupt handling\n");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This printout needs to be moved up, before k_sys_fatal_error_handler() is called

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok ;)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

kernel/fatal.c Outdated
* during ISR exit, but in this case the thread should be
* aborted.
*/
if (z_arch_is_in_nested_exception(esf)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if we get this far we should unconditionally k_panic()

kernel/fatal.c Outdated
@@ -117,5 +119,30 @@ void z_fatal_error(unsigned int reason, const z_arch_esf_t *esf)
__ASSERT(!k_is_in_isr(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update this assertion

@ioannisg ioannisg force-pushed the arch_arm_fix_is_in_isr branch 2 times, most recently from 39a6da6 to 4f77635 Compare October 17, 2019 18:27
@ioannisg ioannisg requested a review from andrewboie October 17, 2019 18:27
@ioannisg ioannisg force-pushed the arch_arm_fix_is_in_isr branch from 4f77635 to d1865cf Compare October 17, 2019 18:30
Copy link
Member Author

@ioannisg ioannisg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andrewboie I did a second try with the z_fatal_error behavior; please, re-review

@ioannisg ioannisg requested a review from agansari October 17, 2019 22:15
We add a useful inline comment in the SVC handler (written in
assembly), which identifies one of the function return points
a bit more clearly.

Signed-off-by: Ioannis Glaropoulos <[email protected]>
Add some documentation for ARM-specific function
z_do_kernel_oops, stating clearly that it is only
invoked inside SVC context. We also comment on
the validity of the supplied ESF.

Signed-off-by: Ioannis Glaropoulos <[email protected]>
This commit refactors and cleans up __fault, so the function
- reduces to supplying MSP, PSP, and EXC_RETURN to the C
  function for fault handling
- simplifies itself, removing conditional
  implementation, i.e. based on ARM Secure firmware,

The reason for that is simple: it is much better to write the
fault handling in C instead of assembly, so we really do only
what is strictly required, in assembly.

Therefore, the commit refactors the z_arm_fault() function
as well, organizing better the different functional blocks,
that is:
- unlocking interrupts
- retriving ESF
- asserting for HW errors
- printing additional error logs

The refactoring unifies the way the ESF is retrieved for the
different Cortex-M variants and security execution states.

Signed-off-by: Ioannis Glaropoulos <[email protected]>
We re-implement the z_arch_is_in_isr function
so it aligns with the implementation for other
ARCHEs, i.e. returning false whenever any IRQ
or system exception is active.

Signed-off-by: Ioannis Glaropoulos <[email protected]>
We introduce a Kconfig option to signify whether
an Architecture has the capability of detecting
whether execution is, currently, in a nested
exception.

Signed-off-by: Ioannis Glaropoulos <[email protected]>
We add an ARM internal API which allows the kernel to
infer the execution mode we are going to return after
the current exception.

Signed-off-by: Ioannis Glaropoulos <[email protected]>
In z_fatal_error() we invoke the arch-specific API that
evaluates whether we are in a nested exception. We then
use the result to log a message that the error occurred
in ISR. In non-test mode, we unconditionally panic, if
an exception has occurred in an ISR and the fatal error
handler has not returned (apart from the case of an
error in stack sentinel check).

Signed-off-by: Ioannis Glaropoulos <[email protected]>
For ARM, Z_ARCH_EXCEPT triggers an SVC to induce a system error.
This code block may be inlined, so, if we want to return from
this error DIRECTLY to thread mode, e.g. if the system error
occurred in ISR context and we are not aborting the current
thread, we must instruct the compiler that the execution
may continue after the inlined SVC. Therefore, we must remove
the CODE_UNREACHABLE statements.

Signed-off-by: Ioannis Glaropoulos <[email protected]>
This commit adds a new test suite in tests/arch/arm that
intends to testing system faults inside interrupts.

Signed-off-by: Ioannis Glaropoulos <[email protected]>
@ioannisg ioannisg force-pushed the arch_arm_fix_is_in_isr branch from d1865cf to f6bd594 Compare October 18, 2019 08:32
Copy link
Collaborator

@agansari agansari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took some time to re-review this patch.

Tested kernel and arch/arm on an M7:

andrei@vboc:~/zephyr$ ./scripts/sanitycheck -p mimxrt1050_evk -T tests/kernel
Renaming output directory to /home/andrei/zephyr/sanity-out.1
JOBS: 4
Building initial testcase list...
84 test configurations selected, 7 configurations discarded due to filters.
Adding tasks to the queue...
total complete:   84/  84  100%  skipped:    7, failed:    0
77 of 84 tests passed (100.00%), 0 failed, 7 skipped with 0 warnings in 603.35 seconds
In total 91 test cases were executed on 1 out of total 208 platforms (0.48%)

andrei@vboc:~/zephyr$ ./scripts/sanitycheck -p mimxrt1050_evk -T tests/arch/arm/
Renaming output directory to /home/andrei/zephyr/sanity-out.2
JOBS: 4
Building initial testcase list...
10 test configurations selected, 0 configurations discarded due to filters.
Adding tasks to the queue...
total complete:   10/  10  100%  skipped:    0, failed:    0
10 of 10 tests passed (100.00%), 0 failed, 0 skipped with 0 warnings in 103.07 seconds
In total 10 test cases were executed on 1 out of total 208 platforms (0.48%)

Also tested this patch on a different debugging environment, no strange behavior observed

@ioannisg
Copy link
Member Author

@andrewboie ping for re-visiting this review :)

@andrewboie andrewboie merged commit 4424356 into zephyrproject-rtos:master Oct 24, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: API Changes to public APIs area: ARM ARM (32-bit) Architecture area: Kernel area: Tests Issues related to a particular existing or missing test bug The issue is a bug, or the PR is fixing a bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants