-
Notifications
You must be signed in to change notification settings - Fork 7.4k
libc: thread-safe newlib #21518
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
libc: thread-safe newlib #21518
Conversation
Add newlib reent struct to each k_thread struct and set newlib's global impure_ptr to point to the reent struct of the current thread after context switch. Signed-off-by: Markus Bernd Moessner <[email protected]>
RFC #21519 |
Some checks failed. Please fix and resubmit. checkpatch issues
Tip: The bot edits this comment instead of posting a new one, so you can check the comment's history to see earlier messages. |
#ifdef CONFIG_NEWLIB_LIBC | ||
_impure_ptr = &_current->base.k_reent; | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that arch_swap
is only used for co-operative task switching. When a task is preempted, this function is not called.
_impure_ptr
should be updated in z_arm_pendsv
, which is the actual common task switching function.
Line 228 of swap_helper.S
would be a good place to add this.
zephyr/arch/arm/core/swap_helper.S
Lines 225 to 229 in 816d2c4
isb | |
#endif | |
ldr r4, =_thread_offset_to_callee_saved |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to make this change in the context switch code for all arches or is there something special about ARM that _impure_ptr needs to be updated like this?
also what about SMP?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to make this change in the context switch code for all arches
Yes, struct reent
is basically per-thread context for newlib.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also what about SMP?
For ARM, SMP is not supported at the moment. I will look into this in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these changes impact all arches so I think we will need to close on this before it can be merged.
I don't see how a global _impure_ptr updated on context switch could ever work in an SMP system, surely newlib has something for this...does this need to be stored in threa-local storage?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps we need to override __getreent()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps we need to override __getreent()?
Looks like that would be the correct approach for SMP.
One problem I see is that __getreent
is not declared __weak
, so it would not be override-able.
I wonder if Zephyr should create a separate fork of newlib for this.
@@ -493,6 +493,10 @@ struct _thread_base { | |||
u8_t cpu_mask; | |||
#endif | |||
|
|||
#ifdef CONFIG_NEWLIB_LIBC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be desirable to add a separate newlib config symbol that enables reent
support (e.g. CONFIG_NEWLIB_LIBC_REENT
).
This symbol can be default y if MULTITHREADING
so as to only enable reent
support when multi-threading is enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there is definitelly only one thread running I think that's a good idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah this would be good, we have some use-cases for disabling mulithreading and using Zephyr more like a HAL (bootloaders, for example)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do you think about about the other switch to reserve memory for the locks? Is it ok, to have one? Would you set the default value to 0? Actually, I dont like to set it to 0 as only the malloc_lock hooks work in this szenario, but I was concerned that existing applications could run out of memory in case I choose a too little default value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The feature seems sound (though I'm no expert on the newlib locking design), but the memory storage for the locks seems kinda wrong?
|
||
SYS_MEM_POOL_DEFINE(z_nl_lock_pool, NULL, | ||
NEWLIB_LOCK_FRAG_SIZE, NEWLIB_LOCK_POOL_SIZE, | ||
1, sizeof(void *), NEWLIB_LOCK_SECTION); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This creates a dependency between newlib and mempool that didn't exist before. Generally those have been either/or: an application will use a heap managed by mempool or by newlib, not both. Now we need to include both variants. Isn't there a way to repurpose the newlib heap code to do this?
And if there's not, you probably want to be looking at the Zephyr mem_slab and not mem_pool, as AFAICT all allocations are of the same object size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we do have a pool of objects here, slab would be better...however, mem slabs can't be used from user mode. sys_mem_pool can, the whole object lives in user memory (we route it to the libc memory domain with NEWLIB_LOCK_SECTION)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used malloc in first place, but it's kind of heavy. Is there any action for me?
lib/libc/newlib/libc-hooks.c
Outdated
static LIBC_DATA SYS_SEM_DEFINE(nl_at_quick_exit_sem, 1, 1); | ||
static LIBC_DATA SYS_SEM_DEFINE(nl_tz_sem, 1, 1); | ||
static LIBC_DATA SYS_SEM_DEFINE(nl_dd_hash_sem, 1, 1); | ||
static LIBC_DATA SYS_SEM_DEFINE(nl_arc4random_sem, 1, 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little confused: why does newlib need recursive locking in some subsystems and not others? Is that something specific to this patch or to newlib?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Newlib - not my idea
lib/libc/newlib/libc-hooks.c
Outdated
|
||
__ASSERT(lock, "failed to allocate memory for newlib lock"); | ||
|
||
(*lock)->pSemOrMtx = (void *) &((char *)lock)[sizeof(void *)]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comments here as in the other init function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the pSemOrMtx
lib/libc/newlib/libc-hooks.c
Outdated
|
||
__ASSERT(lock, "failed to allocate memory for newlib lock"); | ||
|
||
(*lock)->pSemOrMtx = (void *) &((char *)lock)[sizeof(void *)]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not understanding this function. You're taking a lock pointer as an argument (where is that defined?), but then throwing that value away and replacing it with a new heap block (which may be null, and is unchecked), then dereferencing whatever pointer happened to be stored in that uninitialized heap block to store a pointer into the same block?
I'm guessing that what you really want to be doing is allocating a block containing just the sem/mutex union and assigning that through the opaque pointer you're being passed?
Why is there a header containing pSemOrMtx if the pointer always points to the byte after its own address? Why not just cast the struct address in the first place?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're taking a lock pointer as an argument (where is that defined?), but then throwing that value away and replacing it with a new heap block
Yeah this is confusing me too, some more detail on the intention here would be helpful, maybe leave a comment if this is truly correct (although right now it looks like the allocated lock simply leaks)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed the possible null pointer dereference
I'm guessing that what you really want to be doing is allocating a block containing just the sem/mutex union and assigning that through the opaque pointer you're being passed?
Why is there a header containing pSemOrMtx if the pointer always points to the byte after its own address? Why not just cast the struct address in the first place?
Your right, i've changed that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for looking into this, our newlib bindings have needed some attention for a while.
|
||
|
||
#if !defined(_RETARGETABLE_LOCKING) || \ | ||
CONFIG_NEWLIB_LIBC_DYNAMIC_LOCK_MEM_SIZE == 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what happens if this block isn't compiled? (i.e. someone set the dynamic lock mem size to 0)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do you think about about the other switch to reserve memory for the locks? Is it ok, to have one? Would you set the default value to 0? Actually, I dont like to set it to 0 as only the malloc_lock hooks work in this szenario, but I was concerned that existing applications could run out of memory in case I choose a too little default value
@stephanosio @andyross @andrewboie
This has an impact on all architectures. As described within the RFC, I've only added ARM (checkpatch shall complain until all others are there too, to avoid having a partial implementation going into Zephyr). My main intent was to bring up the issue, perhaps it wasn't a good idea to support the RFC with a PR as it draws more attention to the implementation than the feature. Let's get one step back:
If so, there are two ways one can achieve this.
Those functions go in libstdc++ there are no simple hooks - one has to either add a full OS / thread model to GCC / libstdc++, or tweak the single thread implementation by adding hooks which we can use like the newlib stuff. Pro
No worries to drop this PR in favour of something better. |
Not just great, it is absolutely imperative if we are going to do anything useful with the newlib. Maybe #21519 should be labeled a "bug" and "high priority" since this issue practically renders the newlib useless? I can see that there are many projects that require the newlib (e.g. net and gui) and this means that there is the possibility of them "randomly" crashing from the thread safety issues at any moment.
@pabigot This sounds like something that must be addressed before we can say C++ is supported in the Zephyr, alongside many other issues. |
Agreed, added to #18554. Zephyr has some features that make it difficult to guarantee mutex/thread-safety, regardless of language: ZLIs and meta-IRQs. For the purposes of newlib support we can ignore them. |
Add a config switch to adjust the size of the memory reserved for newlibs's dynamic locks. If size is set to 0, only malloc will be thread-safe. Add an an implementation for the locking hooks exposed by newlib. Signed-off-by: Markus Bernd Moessner <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is going to be a good step forward; thanks for taking it on. Just a couple minor non-blocking comments in addition to ones already raised.
IMO the commit messages don't benefit from (1/2) and (2/2) in the subject line.
It might be worth a link in the commit message to newlib documentation on how to support reentrancy, or at least a reference to the #21519 where there are pointers to such documentation.
In particular while looking at this I wanted to know why there were public symbols being defined with non-Zephyr implementation-reserved identifiers like __lock___foo
, and had to grep through the newlib source to get an answer. A comment above the definition noting that these are referenced from newlib when it's built for thread-support (assuming that's true) would help future maintainers understand what these things are.
For style the number of blank lines between definitions isn't consistent (one separation has four). Also I don't think Zephyr generally adds a space in (t)x
casts. Using uncrustify could help reveal issues.
@pabigot Regarding the libstdc++ issue which came up during the discussion here - shouldn't we open a separate issue for that? |
It's nice to have it in the github issue, but when we come back to this in six months for maintenance it'd be nice to have something in the code or commit message. Going back to the issue and PR given a commit SHA1 is not particularly difficult, but it's not trivial either. Mentioning the RFC issue number as
Perhaps. There is a tie to this issue in #18554. I'm not clear on exactly what else needs to be done for C++ support. I don't believe we're ever going to get C++ threads to be supported by Zephyr: there's too much resistance to C++ at the project level, and I don't believe the thread model is compatible. |
@kaidoho this PR seems a bit stuck. An option to move it forward is to add the "Dev-review" label for it to be discussed in the dev review meeting. Another one is to continue the discussion here with @stephanosio and @andyross |
@carlescufi you are right, on the one hand I am waiting for directions and on the other I began to look into GCC / newlib to find out what it takes to have them support a Zephyr thread model. This will end in an RFC with the aim to have a custom toolchain for Zephyr. Perhaps, one would implement this PR differently when GCC / newlib has to be patched anyway. So, I think it is best to write the RFC regarding the toolchain, link this RFC/PR, and then see which direction the discussion takes. Ok, or would you do differently? |
@stephanosio @andyross @andrewboie @carlescufi @pabigot. |
The issue tracking this is currently tracked as "enhancement" and not "bug", and there isn't pressure applied at release time to resolve it since it doesn't contribute to the release bug count requirements. We rely solely on the motivation of the reporter/author to move it along. I think it should be promoted to "bug", but agree to scope it for 2.5 and find a dedicate owner to see it through if @kaidoho isn't working on it. |
This pull request has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this pull request will automatically be closed in 14 days. Note, that you can always re-open a closed pull request at any time. |
This pull request has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this pull request will automatically be closed in 14 days. Note, that you can always re-open a closed pull request at any time. |
Build newlib library to be thread-safe in multithreaded environment. zephyrproject-rtos/zephyr#21518 zephyrproject-rtos/zephyr#21519 zephyrproject-rtos/zephyr#36201 https://sourceware.org/legacy-ml/newlib/2016/msg01165.html https://sourceware.org/git/?p=newlib-cygwin.git;a=commit;h=bd54749095ee45d7136b6e7c8a1e5218749c87b6 Error log: newlib/libc-hooks.c:310:1: note: in expansion of macro 'BUILD_ASSERT' BUILD_ASSERT(IS_ENABLED(_RETARGETABLE_LOCKING), "Retargetable locking must be enabled"); Signed-off-by: Naveen Saini <[email protected]> Tested-by: Jon Mason <[email protected]>
The aim of this PR is to utilize the functionalty provided by newlib for thread-safety.
RFC #21519
Relevant Links
Documentation
Newlib Documentation on __malloc_lock
Newlib Documentation on Reentrancy
Mailing Lists
Newlib discussion on
--enable-newlib-retargetable-locking
[PATCH, newlib] Allow locking routine to be retargeted
What are the __retarget_lock functions?