Skip to content

libc: thread-safe newlib #21518

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed

libc: thread-safe newlib #21518

wants to merge 2 commits into from

Conversation

kaidoho
Copy link

@kaidoho kaidoho commented Dec 19, 2019

Add newlib reent struct to each k_thread struct
and set newlib's global impure_ptr to point to the reent
struct of the current thread after context switch.

Signed-off-by: Markus Bernd Moessner <[email protected]>
@kaidoho
Copy link
Author

kaidoho commented Dec 19, 2019

RFC #21519

@zephyrbot zephyrbot added area: C Library C Standard Library area: ARM ARM (32-bit) Architecture area: API Changes to public APIs area: Kernel labels Dec 19, 2019
@zephyrbot
Copy link
Collaborator

zephyrbot commented Dec 19, 2019

Some checks failed. Please fix and resubmit.

checkpatch issues

-:198: ERROR:SPACING: space prohibited before that ',' (ctx:WxW)
#198: FILE: lib/libc/newlib/libc-hooks.c:360:
+		sys_sem_take((struct sys_sem *) lock , K_FOREVER);
 		                                     ^

-:216: ERROR:SPACING: space prohibited before that close parenthesis ')'
#216: FILE: lib/libc/newlib/libc-hooks.c:378:
+		sys_sem_give((struct sys_sem *) lock );

-:243: ERROR:POINTER_LOCATION: "(foo*)" should be "(foo *)"
#243: FILE: lib/libc/newlib/libc-hooks.c:405:
+		sys_mutex_lock((struct sys_mutex*) lock, K_FOREVER);

- total: 3 errors, 0 warnings, 239 lines checked

NOTE: For some of the reported defects, checkpatch may be able to
      mechanically convert to the typical style using --fix or --fix-inplace.

Your patch has style problems, please review.

NOTE: Ignored message types: AVOID_EXTERNS BRACES CONFIG_EXPERIMENTAL CONST_STRUCT DATE_TIME FILE_PATH_CHANGES MINMAX NETWORKING_BLOCK_COMMENT_STYLE PRINTK_WITHOUT_KERN_LEVEL SPLIT_STRING VOLATILE

NOTE: If any of the errors are false positives, please report
      them to the maintainers.

Tip: The bot edits this comment instead of posting a new one, so you can check the comment's history to see earlier messages.

Comment on lines +57 to +59
#ifdef CONFIG_NEWLIB_LIBC
_impure_ptr = &_current->base.k_reent;
#endif
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that arch_swap is only used for co-operative task switching. When a task is preempted, this function is not called.

_impure_ptr should be updated in z_arm_pendsv, which is the actual common task switching function.

Line 228 of swap_helper.S would be a good place to add this.

isb
#endif
ldr r4, =_thread_offset_to_callee_saved

Copy link
Contributor

@andrewboie andrewboie Dec 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to make this change in the context switch code for all arches or is there something special about ARM that _impure_ptr needs to be updated like this?

also what about SMP?

Copy link
Member

@stephanosio stephanosio Dec 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to make this change in the context switch code for all arches

Yes, struct reent is basically per-thread context for newlib.

https://github.com/bminor/newlib/blob/b61dc22adaf82114eee3edce91cc3433bcd27fe5/newlib/libc/include/sys/reent.h#L377-L424

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also what about SMP?

For ARM, SMP is not supported at the moment. I will look into this in the future.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these changes impact all arches so I think we will need to close on this before it can be merged.

I don't see how a global _impure_ptr updated on context switch could ever work in an SMP system, surely newlib has something for this...does this need to be stored in threa-local storage?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps we need to override __getreent()?

Copy link
Member

@stephanosio stephanosio Dec 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps we need to override __getreent()?

Looks like that would be the correct approach for SMP.

https://github.com/eblot/newlib/blob/2a63fa0fd26ffb6603f69d9e369e944fe449c246/newlib/libc/sys/linux/linuxthreads/getreent.c#L5-L10

One problem I see is that __getreent is not declared __weak, so it would not be override-able.

https://github.com/eblot/newlib/blob/2a63fa0fd26ffb6603f69d9e369e944fe449c246/newlib/libc/reent/getreent.c#L10-L14

I wonder if Zephyr should create a separate fork of newlib for this.

@@ -493,6 +493,10 @@ struct _thread_base {
u8_t cpu_mask;
#endif

#ifdef CONFIG_NEWLIB_LIBC
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be desirable to add a separate newlib config symbol that enables reent support (e.g. CONFIG_NEWLIB_LIBC_REENT).

This symbol can be default y if MULTITHREADING so as to only enable reent support when multi-threading is enabled.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is definitelly only one thread running I think that's a good idea.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah this would be good, we have some use-cases for disabling mulithreading and using Zephyr more like a HAL (bootloaders, for example)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you think about about the other switch to reserve memory for the locks? Is it ok, to have one? Would you set the default value to 0? Actually, I dont like to set it to 0 as only the malloc_lock hooks work in this szenario, but I was concerned that existing applications could run out of memory in case I choose a too little default value

Copy link
Collaborator

@andyross andyross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The feature seems sound (though I'm no expert on the newlib locking design), but the memory storage for the locks seems kinda wrong?


SYS_MEM_POOL_DEFINE(z_nl_lock_pool, NULL,
NEWLIB_LOCK_FRAG_SIZE, NEWLIB_LOCK_POOL_SIZE,
1, sizeof(void *), NEWLIB_LOCK_SECTION);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This creates a dependency between newlib and mempool that didn't exist before. Generally those have been either/or: an application will use a heap managed by mempool or by newlib, not both. Now we need to include both variants. Isn't there a way to repurpose the newlib heap code to do this?

And if there's not, you probably want to be looking at the Zephyr mem_slab and not mem_pool, as AFAICT all allocations are of the same object size.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we do have a pool of objects here, slab would be better...however, mem slabs can't be used from user mode. sys_mem_pool can, the whole object lives in user memory (we route it to the libc memory domain with NEWLIB_LOCK_SECTION)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used malloc in first place, but it's kind of heavy. Is there any action for me?

static LIBC_DATA SYS_SEM_DEFINE(nl_at_quick_exit_sem, 1, 1);
static LIBC_DATA SYS_SEM_DEFINE(nl_tz_sem, 1, 1);
static LIBC_DATA SYS_SEM_DEFINE(nl_dd_hash_sem, 1, 1);
static LIBC_DATA SYS_SEM_DEFINE(nl_arc4random_sem, 1, 1);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little confused: why does newlib need recursive locking in some subsystems and not others? Is that something specific to this patch or to newlib?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Newlib - not my idea


__ASSERT(lock, "failed to allocate memory for newlib lock");

(*lock)->pSemOrMtx = (void *) &((char *)lock)[sizeof(void *)];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comments here as in the other init function.

Copy link
Author

@kaidoho kaidoho Dec 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the pSemOrMtx


__ASSERT(lock, "failed to allocate memory for newlib lock");

(*lock)->pSemOrMtx = (void *) &((char *)lock)[sizeof(void *)];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not understanding this function. You're taking a lock pointer as an argument (where is that defined?), but then throwing that value away and replacing it with a new heap block (which may be null, and is unchecked), then dereferencing whatever pointer happened to be stored in that uninitialized heap block to store a pointer into the same block?

I'm guessing that what you really want to be doing is allocating a block containing just the sem/mutex union and assigning that through the opaque pointer you're being passed?

Why is there a header containing pSemOrMtx if the pointer always points to the byte after its own address? Why not just cast the struct address in the first place?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're taking a lock pointer as an argument (where is that defined?), but then throwing that value away and replacing it with a new heap block

Yeah this is confusing me too, some more detail on the intention here would be helpful, maybe leave a comment if this is truly correct (although right now it looks like the allocated lock simply leaks)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed the possible null pointer dereference

I'm guessing that what you really want to be doing is allocating a block containing just the sem/mutex union and assigning that through the opaque pointer you're being passed?

Why is there a header containing pSemOrMtx if the pointer always points to the byte after its own address? Why not just cast the struct address in the first place?

Your right, i've changed that.

Copy link
Contributor

@andrewboie andrewboie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for looking into this, our newlib bindings have needed some attention for a while.



#if !defined(_RETARGETABLE_LOCKING) || \
CONFIG_NEWLIB_LIBC_DYNAMIC_LOCK_MEM_SIZE == 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens if this block isn't compiled? (i.e. someone set the dynamic lock mem size to 0)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you think about about the other switch to reserve memory for the locks? Is it ok, to have one? Would you set the default value to 0? Actually, I dont like to set it to 0 as only the malloc_lock hooks work in this szenario, but I was concerned that existing applications could run out of memory in case I choose a too little default value

@stephanosio stephanosio added area: Architectures and removed area: ARM ARM (32-bit) Architecture labels Dec 20, 2019
@kaidoho
Copy link
Author

kaidoho commented Dec 20, 2019

@stephanosio @andyross @andrewboie

do we need to make this change in the context switch code for all arches or is there something special about ARM that _impure_ptr needs to be updated like this?

also what about SMP?

This has an impact on all architectures. As described within the RFC, I've only added ARM (checkpatch shall complain until all others are there too, to avoid having a partial implementation going into Zephyr).

My main intent was to bring up the issue, perhaps it wasn't a good idea to support the RFC with a PR as it draws more attention to the implementation than the feature.

Let's get one step back:

  • Do you agree that it would be great to have a thread-safe newlib?

If so, there are two ways one can achieve this.

  1. The one I show within this PR. It leaves newlib "as is" and only adds the hooks and impure_ptr switching to Zephyr.
    Pro:
  • GNU ARM Embedded can be used without patches
  • Small changes which can be done in short amount of time
  1. @stephanosio thought about patching newlib to have getreent available. That's partially what I considered when mentioning the alternative RTEMS route to go. Actually, I'd prefer to go the extra mile and have a target OS dependend toolchain (patching newlib and GCC for Zephyr). Why? Well, looking forward the next issue which will arise is within libstdc++:

The C++ library string functionality requires a couple of atomic operations to provide thread-safety. If you don't take any special action, the library will use stub versions of these functions that are not thread-safe. They will work fine, unless your applications are multi-threaded.

If you want to provide custom, safe, versions of these functions, there are two distinct approaches. One is to provide a version for your CPU, using assembly language constructs. The other is to use the thread-safety primitives in your operating system.
https://gcc.gnu.org/onlinedocs/libstdc++/manual/internals.html#internals.thread_safety

No strings == no go to me.

Those functions go in libstdc++ there are no simple hooks - one has to either add a full OS / thread model to GCC / libstdc++, or tweak the single thread implementation by adding hooks which we can use like the newlib stuff.

Pro

  • Addresses not only C but also C++

No worries to drop this PR in favour of something better.

@stephanosio
Copy link
Member

Do you agree that it would be great to have a thread-safe newlib?

Not just great, it is absolutely imperative if we are going to do anything useful with the newlib.

Maybe #21519 should be labeled a "bug" and "high priority" since this issue practically renders the newlib useless?

I can see that there are many projects that require the newlib (e.g. net and gui) and this means that there is the possibility of them "randomly" crashing from the thread safety issues at any moment.

Those functions go in libstdc++ there are no simple hooks - one has to either add a full OS / thread model to GCC / libstdc++, or tweak the single thread implementation by adding hooks which we can use like the newlib stuff.

@pabigot This sounds like something that must be addressed before we can say C++ is supported in the Zephyr, alongside many other issues.

@pabigot
Copy link
Collaborator

pabigot commented Dec 20, 2019

@pabigot This sounds like something that must be addressed before we can say C++ is supported in the Zephyr, alongside many other issues.

Agreed, added to #18554.

Zephyr has some features that make it difficult to guarantee mutex/thread-safety, regardless of language: ZLIs and meta-IRQs. For the purposes of newlib support we can ignore them.

Add a config switch to adjust the size of the memory reserved
for newlibs's dynamic locks. If size is set to 0, only malloc
will be thread-safe.

Add an an implementation for the locking hooks exposed by newlib.

Signed-off-by: Markus Bernd Moessner <[email protected]>
@zephyrbot zephyrbot added the area: ARM ARM (32-bit) Architecture label Dec 20, 2019
Copy link
Collaborator

@pabigot pabigot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is going to be a good step forward; thanks for taking it on. Just a couple minor non-blocking comments in addition to ones already raised.

IMO the commit messages don't benefit from (1/2) and (2/2) in the subject line.

It might be worth a link in the commit message to newlib documentation on how to support reentrancy, or at least a reference to the #21519 where there are pointers to such documentation.

In particular while looking at this I wanted to know why there were public symbols being defined with non-Zephyr implementation-reserved identifiers like __lock___foo, and had to grep through the newlib source to get an answer. A comment above the definition noting that these are referenced from newlib when it's built for thread-support (assuming that's true) would help future maintainers understand what these things are.

For style the number of blank lines between definitions isn't consistent (one separation has four). Also I don't think Zephyr generally adds a space in (t)x casts. Using uncrustify could help reveal issues.

@kaidoho
Copy link
Author

kaidoho commented Dec 23, 2019

@pabigot Regarding the libstdc++ issue which came up during the discussion here - shouldn't we open a separate issue for that?

@pabigot
Copy link
Collaborator

pabigot commented Dec 23, 2019

@pabigot Documentation (at least what exists and is not in source), even the newlib discussions are linked in the RFC - and the first thing I did, was to interlink RFC and PR. I've now added a textual link and copied over the links to the documentation.

It's nice to have it in the github issue, but when we come back to this in six months for maintenance it'd be nice to have something in the code or commit message. Going back to the issue and PR given a commit SHA1 is not particularly difficult, but it's not trivial either. Mentioning the RFC issue number as #21519 in the commit message would make it easier; a short sentence in the code explaining things might even make it unnecessary.

Regarding the libstdc++ issue which came up during the discussion here - shouldn't we open a separate issue for that?

Perhaps. There is a tie to this issue in #18554. I'm not clear on exactly what else needs to be done for C++ support. I don't believe we're ever going to get C++ threads to be supported by Zephyr: there's too much resistance to C++ at the project level, and I don't believe the thread model is compatible.

@carlescufi
Copy link
Member

@kaidoho this PR seems a bit stuck. An option to move it forward is to add the "Dev-review" label for it to be discussed in the dev review meeting. Another one is to continue the discussion here with @stephanosio and @andyross

@kaidoho
Copy link
Author

kaidoho commented Jan 16, 2020

@carlescufi you are right, on the one hand I am waiting for directions and on the other I began to look into GCC / newlib to find out what it takes to have them support a Zephyr thread model. This will end in an RFC with the aim to have a custom toolchain for Zephyr. Perhaps, one would implement this PR differently when GCC / newlib has to be patched anyway. So, I think it is best to write the RFC regarding the toolchain, link this RFC/PR, and then see which direction the discussion takes. Ok, or would you do differently?

@github-actions github-actions bot added has-conflicts Issue/PR has conflicts with another issue/PR and removed has-conflicts Issue/PR has conflicts with another issue/PR labels Jun 29, 2020
@alexanderwachter
Copy link
Member

@stephanosio @andyross @andrewboie @carlescufi @pabigot.
It seems that this PR is still relevant for newlib support. How do we proceed with that?

@andrewboie
Copy link
Contributor

@stephanosio @andyross @andrewboie @carlescufi @pabigot.
It seems that this PR is still relevant for newlib support. How do we proceed with that?

The issue tracking this is currently tracked as "enhancement" and not "bug", and there isn't pressure applied at release time to resolve it since it doesn't contribute to the release bug count requirements. We rely solely on the motivation of the reporter/author to move it along.

I think it should be promoted to "bug", but agree to scope it for 2.5 and find a dedicate owner to see it through if @kaidoho isn't working on it.

@github-actions
Copy link

github-actions bot commented Nov 3, 2020

This pull request has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this pull request will automatically be closed in 14 days. Note, that you can always re-open a closed pull request at any time.

@github-actions
Copy link

This pull request has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this pull request will automatically be closed in 14 days. Note, that you can always re-open a closed pull request at any time.

@github-actions github-actions bot added the Stale label Mar 22, 2021
@github-actions github-actions bot closed this Apr 6, 2021
saininav added a commit to saininav/meta-zephyr that referenced this pull request Dec 20, 2022
Build newlib library to be thread-safe in multithreaded environment.

zephyrproject-rtos/zephyr#21518
zephyrproject-rtos/zephyr#21519
zephyrproject-rtos/zephyr#36201

https://sourceware.org/legacy-ml/newlib/2016/msg01165.html
https://sourceware.org/git/?p=newlib-cygwin.git;a=commit;h=bd54749095ee45d7136b6e7c8a1e5218749c87b6

Error log:
newlib/libc-hooks.c:310:1: note: in expansion of macro 'BUILD_ASSERT'
BUILD_ASSERT(IS_ENABLED(_RETARGETABLE_LOCKING), "Retargetable locking must be enabled");

Signed-off-by: Naveen Saini <[email protected]>
Tested-by: Jon Mason <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: API Changes to public APIs area: Architectures area: ARM ARM (32-bit) Architecture area: C Library C Standard Library area: Kernel has-conflicts Issue/PR has conflicts with another issue/PR Stale
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants