Skip to content

posix: semaphore: implement named semaphore functions #67007

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jan 10, 2024

Conversation

ycsin
Copy link
Member

@ycsin ycsin commented Dec 26, 2023

Implemented:

  • sem_open()
  • sem_close()
  • sem_unlink()

Fixes #66955
Fixes #66956
Fixes #66957

@ycsin ycsin requested a review from cfriedt December 26, 2023 10:19
@ycsin ycsin marked this pull request as ready for review December 26, 2023 11:59
sys_snode_t snode;
sem_t sem;
atomic_t ref_count;
char *name;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could avoid malloc if allocated like this

char name[CONFIG_SEM_NAMELEN_MAX];

I think we have this pattern in many places in the code space.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm gonna gather more comments for this one, doing so will allocate a fixed sized array regardless of the actual length of the name.

Alternatively we can allocate the buffer for the nsem_obj and the name array in one go and the total memory allocated will depend on the name length. Ideally we would be able to k_realloc the buffer once the sem is unlinked, however the k_realloc isn't available (#41151), so I opted for the current implementation as the name buffer can be freed separately upon sem_unlink.

Copy link
Member

@cfriedt cfriedt Dec 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say a good follow-up PR would be to add static allocation support.

I'm ok with dynamic going in first and static coming in later, but I think both use cases are valuable for Zephyr.

The existing pattern that doesn't use malloc should be fine for filling the static gap later. Thanks for your help ❤️

@ycsin ycsin mentioned this pull request Dec 28, 2023
@@ -2728,6 +2728,8 @@ POSIX API layer:
status: maintained
maintainers:
- cfriedt
collaborators:
- ycsin
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🥳

@ycsin ycsin force-pushed the pr/posix_nsem branch 3 times, most recently from f590fa8 to 2f76542 Compare January 2, 2024 04:04
cfriedt
cfriedt previously approved these changes Jan 3, 2024
cfriedt
cfriedt previously approved these changes Jan 3, 2024
@npitre
Copy link
Collaborator

npitre commented Jan 6, 2024

This code is racy.

sem_t *sem_open(const char *name, int oflags, ...)
{
       /* Check if the named semaphore exists */
       k_mutex_lock(&nsem_mutex, K_FOREVER);
       nsem = find_nsem(name);
       k_mutex_unlock(&nsem_mutex);

       /* !!! preemption point !!! */
       [...]

       if (nsem == NULL) {
                [...]
       } else {
               atomic_inc(&nsem->ref_count);
       }

When find_nsem() returns a valid non-null pointer for nsem,
nothing prevents another thread calling sem_unlink() and sem_close()
at the indicated preemption point, turning nsem into a zombie pointer
and corrupting freed and potentially reused memory.

@ycsin ycsin force-pushed the pr/posix_nsem branch 3 times, most recently from 90b0fee to bec4c00 Compare January 8, 2024 16:29
@npitre
Copy link
Collaborator

npitre commented Jan 8, 2024

The code now looks correct, the race is gone.

Here's some suggestions that aren't critical but could optimize the code
further:

  • All ref_count manipulations, except for one case, happen while
    nsem_mutex is taken. The exception is sem_close() which is unlikely
    to be on any performance-critical path. By enlarging the nsem_mutex
    coverage in sem_close() you could drop all atomic operations and convert
    ref_count to a plain int which would reduce the binary footprint, and
    possibly run faster on some architectures.

  • You could consider the presence of a name as ref_count worthy. This
    means initializing ref_count to 2 instead of 1 in sem_open() and
    decrementing ref_count in sem_unlink(). This would allow for removing
    the nsem->name == NULL test in sem_close() relying on the count alone

  • With the above, the ref_count decrements could be factored into
    remove_nsem_if_unused() (and the function properly renamed to something
    like e.g. nsem_put_ref() or the like) which would reduce binary footprint
    further. This would open the possibility for cleanly asserting that the
    count does not go negative and there is no longer a name if the count is 0
    in only one place.

@npitre
Copy link
Collaborator

npitre commented Jan 8, 2024

And another comment. You have:

#define SEM_FAILED (-1)

Since this is used where a pointer is normally returned, it would be nicer
to add the cast to the definition directly. Also, it is typically defined
as 0 so the error case can be seen as a null pointer return. So I'd suggest:

#define SEM_FAILED ((sem_t *) 0)

@ycsin
Copy link
Member Author

ycsin commented Jan 9, 2024

  • You could consider the presence of a name as ref_count worthy. This
    means initializing ref_count to 2 instead of 1 in sem_open() and
    decrementing ref_count in sem_unlink(). This would allow for removing
    the nsem->name == NULL test in sem_close() relying on the count alone
  • With the above, the ref_count decrements could be factored into
    remove_nsem_if_unused() (and the function properly renamed to something
    like e.g. nsem_put_ref() or the like) which would reduce binary footprint
    further. This would open the possibility for cleanly asserting that the
    count does not go negative and there is no longer a name if the count is 0
    in only one place.

Hey @npitre, thanks for the suggestions!

Regarding to point 2 and 3, I've added assert tests to make sure that the ref_count doesn't underflow or overflow, but I'm not sure how to properly 'encodes' the unlink function call into the ref_count while still be able to guarantee that a named semaphore is destroyed only if it is unlinked and closed, i.e. without the nsem->name == NULL test, wouldn't sem_close + sem_close equals to sem_unlink + sem_close?

@ycsin
Copy link
Member Author

ycsin commented Jan 9, 2024

Updated to:

  • adopt suggestions from @npitre (refactor for the removal of atomic, under/overflow tests)
  • added a little stress test for the named semaphore (one thread sem_open a semaphore repeatedly and finally sem_unlink, the other thread doing sem_close)
  • added a bit more testpoints for using a named semaphore in the normal semaphore test.

Add myself as a collaborator in the POSIX subsystem.

Signed-off-by: Yong Cong Sin <[email protected]>
@ycsin ycsin force-pushed the pr/posix_nsem branch 2 times, most recently from 4767810 to 2d9c415 Compare January 9, 2024 14:59
@ycsin ycsin requested a review from cfriedt January 9, 2024 16:34
@npitre
Copy link
Collaborator

npitre commented Jan 9, 2024

i.e. without the nsem->name == NULL test, wouldn't sem_close +
sem_close equals to sem_unlink + sem_close?

Consider these events:

operation                refcount change          refcount state
----------------------------------------------------------------

sem_open()               = 2 (created)            2
sem_open()               + 1 (already exists)     3
sem_close()              - 1                      2
sem_close()              - 1                      1
sem-unlink()             - 1                      0 (destroyed)

sem_open()               = 2 (created)            2
sem_open()               + 1 (already exists)     3
sem_unlink()             - 1                      2
sem_close()              - 1                      1
sem_close()              - 1                      0 (destroyed)

I'll push this change with a few misc tidbits on top of your pull request.
Feel free to fold them in your original commits or have them merged as is.

if (nsem->ref_count == 0) {
__ASSERT(nsem->name == NULL, "ref_count is 0 but sem is not unlinked");
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@npitre This means that the following behavior is guaranteed only when CONFIG_ASSERT is enabled, is that intended?

If the semaphore has not been removed with a successful call to sem_unlink(), then sem_close() has no effect on the state of the semaphore.

i.e.: a named semaphore can be closed without being unlinked:

operation                refcount change          refcount state
----------------------------------------------------------------

sem_open()               = 2 (created)            2
sem_close()              - 1                      1
sem_close()              - 1                      0

@npitre
Copy link
Collaborator

npitre commented Jan 10, 2024

if (nsem->ref_count == 0) {

  • __ASSERT(nsem->name == NULL, "ref_count is 0 but sem is not unlinked");

@npitre This means that the following behavior is guaranteed only when
CONFIG_ASSERT is enabled, is that intended?

Yes. This would happen only if the semaphore user's code is buggy.
This should never happen with well behaved code. Assertions can be configured
out to save on binary size and runtime overhead once the code is proven to
behave properly.

If the semaphore has not been removed with a successful call to
sem_unlink()](https://pubs.opengroup.org/onlinepubs/9699919799/functions/sem_unlink.html),
then em_close()](https://pubs.opengroup.org/onlinepubs/9699919799/functions/sem_close.html)
has no effect on the state of the semaphore.

i.e.: a named semaphore can be closed without being unlinked

Yes, that is what the code does. The test suite exercizes that condition too.

operation                refcount change          refcount state
----------------------------------------------------------------

sem_open()               = 2 (created)            2
sem_close()              - 1                      1
sem_close()              - 1                      0

That is a bad example. You normally must have balanced open() and
close(). Closing your own instance twice is buggy. Inagine what would
happen if you have an open() from two different threads and one thread
does a close() twice. The other thread is unlikely to be happy about that.

Multiple close() for a single open() is so unusual and unexpected
that the spec would mention it explicitly if that were allowed.

@@ -15,7 +15,7 @@
struct nsem_obj {
sys_snode_t snode;
sem_t sem;
unsigned int ref_count;
int ref_count;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@npitre I've modified other part that uses ref_count to int as well, and changed the overflow test from __ASSERT_NO_MSG(nsem->ref_count != UINT_MAX) to __ASSERT_NO_MSG(nsem->ref_count != INT_MAX)

ycsin and others added 6 commits January 10, 2024 11:42
Implements `sem_open()`, `sem_unlink()` & `sem_close()`
functions and added tests for them.

Updated existing tests and POSIX docs.

Signed-off-by: Yong Cong Sin <[email protected]>
The `sem_open()`, `sem_close()` & `sem_unlink()` functions are
now implemented, so mark them as supported.

Signed-off-by: Yong Cong Sin <[email protected]>
Localize a few public variables and refactor the test and
functions so that they are reusable.

Signed-off-by: Yong Cong Sin <[email protected]>
Run the normal semaphore test with a named semaphore.

Signed-off-by: Yong Cong Sin <[email protected]>
- Regroup refcount decrement and semaphore destruction by making the
  linked state into a counted reference for it. This allows for
  simplifying the code and cleanly adding a few assertions in a common
  location.

- Remove redundant initialization to NULL on memory about to be freed
  and local pointer in nsem_cleanup().

Signed-off-by: Nicolas Pitre <[email protected]>
- Adjust refcount checks with regards to previous commit.

- Remove redundant zassert_not_null() on local pointers after a
  sem_close(). There is no implicit reference passing in C.

Signed-off-by: Nicolas Pitre <[email protected]>
@ycsin
Copy link
Member Author

ycsin commented Jan 10, 2024

Feel free to fold them in your original commits or have them merged as is.

@npitre I've took the liberty to fold some of the fixes from your commits into the original commits where they where initially introduced and reworded your commits, but left the ref_count changes as-is in your commits

@cfriedt cfriedt merged commit e5b3231 into zephyrproject-rtos:main Jan 10, 2024
@ycsin ycsin deleted the pr/posix_nsem branch January 10, 2024 12:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

posix: implement sem_unlink() posix: implement sem_open() posix: implement sem_close()
5 participants