Skip to content

[RFC] Proposed development plan for Zephyr's POSIX subsystem #17706

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
pfalcon opened this issue Jul 22, 2019 · 6 comments
Closed

[RFC] Proposed development plan for Zephyr's POSIX subsystem #17706

pfalcon opened this issue Jul 22, 2019 · 6 comments
Assignees
Labels
area: POSIX POSIX API Library RFC Request For Comments: want input from the community

Comments

@pfalcon
Copy link
Collaborator

pfalcon commented Jul 22, 2019

Summary

This RFC seeks to transform and extend Zephyr's POSIX subsystem, which was initially conceived to implement just a small embedded profile specification of POSIX, into a subsystem with wider coverage of the full POSIX standard. While doing so, it doesn't seek to establish specific (sub)set of the POSIX standard to implement. Instead, it seeks to establish process and criteria to allow incremental and gradual development and addition of new features, based on the Zephyr stakeholder and community needs.

Mission statement

There're 2 ways to develop software for a particular system:

  1. Write it from scratch specifically for a given system.
  2. Port existing software (developed for other systems and/or standard APIs).

Zephyr is a small, efficient RTOS, and thus p.1 was the initial scope. But importance of p.2 should not be underestimated. The author of this RFC and the growing community of Zephyr users think that inability or extra hurdles in porting existing software become a growing blocker on the way to wider Zephyr adoption and usage.

This RFC seeks to remedy the situation and enable large-scale application porting to Zephyr, by laying close attention on the implementation of the standard OS API (the POSIX standard). At the same time, it seeks to do so in sustainable, manageable and lean way, following the principles of agile software development and open-source, community-driven process.

Motivation

Zephyr includes many subsystems, which are largely disjoint. One of such subsystem is BSD Sockets(-like) subsystem, initially written by the author of this RFC. It was initially developed as a proof-of-concept, alternative networking API to Zephyr's own (adhoc) networking API. There were 3 main ideas why adding BSD Sockets(-like) API to Zephyr would be useful:

  1. To reuse programmers' existing knowledge and experience when developing Zephyr applications.
  2. To base Zephyr API on well-known, tested and tried design patterns.
  3. To allow to port existing applications to Zephyr.

Of these, p.1 was the initial motivation, p.2 helped the BSD Sockets(-like) API to achieve status of the official networking API, when it became clear that it provides a good answer for kernel-vs-userspace separation challenges and resource protection needs.

However, leveraging p.3 took some time to gather momentum, with real work starting since the beginning of this year (2019). Even the first porting experiment (see a retrospective section below) exposed big issues with BSD Sockets(-like) subsystem. To remind, this section starts with "Zephyr includes many subsystems, which are largely disjoint." Then, throughout the text, the sockets subsystem is called "BSD Sockets(-like)". Existing sockets subsystem largely follows the spirit of BSD Sockets API, but lacks a lot of functionality and features a lot of small-ish differences if taken POSIX BSD Sockets API by word. It also doesn't integrate well with other Zephyr subsystems, like existing (also very incomplete) POSIX subsystem and C library.

All that led to following issues observed:

  • A lot of functionality have to be added to BSD Sockets API (even if largely shallow, i.e. simple enough features).
  • There're still trivial/repetitive changes to be done to 3rd-party code, e.g. due differences in header location of "real POSIX" vs "Zephyr's BSD Sockets(-like) subsystem".
  • Finally, existing applications never use just BSD Sockets API part of POSIX, but rely on various other APIs (and it should be reminded that C standard library is also part of POSIX). Due to "disintegrated" nature of Zephyr subsystems, trying to use various parts in the same application led to heavy conflicts and ugly workarounds (that's if a programmer was patient enough to reach for them and not give up, writing off Zephyr's subsystem(s) involved as "completely broken").

Over time, while fighting with the issues described above, the solution became apparent: Different Zephyr subsystems should be integrated together under auspices of the POSIX standard, following it closely by a word, not just by a spirit.

Implementation process

Developing a complete implementation of POSIX IEEE 1003.1 is no simple task, due to a breadth and depth of the standard. It would take dozens of man-years to finish that task. There're no such (formally allocatable) resources in the Zephyr community, so this RFC doesn't lay a specific plan to implement "full POSIX". Instead, the proposal is to focus on the practical side of things. As the previous sections said, the main motivation is to be able to port/reuse existing application software to Zephyr, and that's why we're interested POSIX, and not any other way around. Thus, development of the POSIX subsystem should be primarily driven by active porting efforts:

  1. Zephyr community selects POSIX-compatible project(s) to port to Zephyr at a particular streak (based on the community needs/interests).
  2. Features missing from POSIX subsystem are developed, functionality not working properly or faithfully enough gets fixed/extended - all in incremental manner.
  3. Work done in p.2 is submitted/merged upstream in agile/streamlined manner, to parallelize the work, let other users benefit even from interim results, and motivate them to join the effort, providing a positive feedback cycle.
  4. Process repeats from p.1

This is essentially a lean/agile development methodology, where development is driven by the short-term needs, and as long as the development goes in the right direction - more POSIX functionality gets implemented (even if not completely!), CI passes, there're no obvious mistakes or noticeable/avoidable technical debt added - it gets merged and process immediately repeats with the next development iteration, etc.

Of course, besides community-driven new-feature process, there's also maintenance process working in a usual way:

  1. As time permits, maintainers select known technical debt, or known issue to improve.
  2. Improvement is made.
  3. Improvement is merged.
  4. Repeats from p.1.

This process might have more background priority and lower intensity, but otherwise follows the same agile workflow as feature process, with the same acceptance criteria (as long as a change improves the situation and doesn't deteriorate it, it's good to go).

Relationship to the existing Zephyr POSIX subsystem

This efforts is supposed to be fully based on the existing POSIX subsystem, and intended to continue its development further in continuous, seamless, sustainable fashion. There's no intention to replace it, tear it off, beat with sticks, or anything like that. There may be a need for deep bug-fixes or wide refactors, but too-deep and too-wide cases should be rare, and each case would be handled on as-needed basis, following the usual process (big non-trivial changes gets RFCed and discussed, etc., while normal changes follow the agile process described above).

It should be noted that the process of the elaboration of the existing POSIX subsystem is going for quite some time now, and this RFC effectively just captures this existing practices, for the entire Zephyr community to be in loop of it.

During the initial discussion of the development process of the POSIX subsystem (i.e. the subject of this RFC), it was raised to the attention the fact that initially the POSIX subsystem was intended to implement PSE52 profile of POSIX. The author of this RFC (also a maintainer of the POSIX subsystem for last half a year and the author of many changes to it) has to admit that such a claim came as a surprise. A lot of time while preparing this RFC was spent trying to understand this situation and how historical plans for PSE52 profile development affect this RFC. Below, the situation with PSE52 is traced in detail:

  1. Let's try to search the official documentation for PSE52: https://docs.zephyrproject.org/latest/search.html?q=PSE52&check_keywords=yes&area=default . Only one hit, for the 1.11.0 changelog: https://docs.zephyrproject.org/latest/releases/release-notes-1.11.html?highlight=pse52
  2. Let's double-check by grepping doc sources in the tree:
zephyr/doc$ grep -r -i PSE52 *
releases/release-notes-1.11.rst:* POSIX PSE52 partial support.
releases/release-notes-1.11.rst:* POSIX PSE52 support:
releases/release-notes-1.11.rst:* :github:`1291` - Initial Posix PSE52 Support

So, web search works well, there're no other references in the docs.
3. The 1.11.0 changelog links to #1291 dated 2017-08-30, which indeed talks about implementing PSE52 subset of POSIX.
4. The answer would be implied by the doc search above, but let's double-check:

zephyr$ grep -r -i PSE52 *
doc/releases/release-notes-1.11.rst:* POSIX PSE52 partial support.
doc/releases/release-notes-1.11.rst:* POSIX PSE52 support:
doc/releases/release-notes-1.11.rst:* :github:`1291` - Initial Posix PSE52 Support

I.e., there're no further mentioning of PSE52 in the tree, in particular, no config options specifically for PSE52.

So, how the existing POSIX subsystem is described by the config options?

  1. https://docs.zephyrproject.org/latest/reference/kconfig/CONFIG_POSIX_API.html :

Enable mostly-standards-compliant implementations of various POSIX (IEEE 1003.1) APIs.

I.e. the main POSIX option describes itself as implementing the "big" POSIX (IEEE 1003.1), not some subset (IEEE 1003.13). For the full disclosure, this description comes from a patch by the author of this RFC, dated 2018-09-27: 8dc69e0 . This proves the point that the changes described in this document didn't start today or yesterday, but are in progress for quite some time. (After making a number of changes like that to POSIX subsystem, the author of this RFC volunteered to be a maintainer of the subsystem to progress it along the vision now formally described in this document).

However, I didn't mention IEEE 1003.1 out of top of my head, I essentially just copied it from a description of one previous POSIX patches: eb0aaca :

Add IEEE 1003.1 Posix Style file system API support.

That commit was made by one of the original authors of the POSIX subsystem.

Hopefully, the evidence presented is enough to make following summary:

  1. Indeed, the implementation of the POSIX subsystem started with a request to implement a subset of POSIX functionality mandated by PSE52.
  2. However, even during initial implementation, there was leaning towards referring to the full POSIX standard, IEEE 1003.1. (We might pause here and start saying that it was a typo, but such a discussion would lead us sideways, with counter-claims that real-world engineers are less interested in OpenGroup marketing materials obscure subsets of a well-known API, than the whole API, which allows to work with real-world applications. Again, let's please not go there.)
  3. Beyond the initial implementation 2 years ago, the POSIX subsystem organically and gradually was growing to subsume more POSIX functionality (and a lot of bugfixes). This RFC is nothing but captures the existing process.
  4. In either case, the original PSE52 subset goal doesn't conflict in any way with the subject of this RFC. PSE52 is a subset of full POSIX. Full POSIX is superset of PSE52. Original PSE52 implementation is no alien to the implementation discussed in this RFC. The wider POSIX implementation discussed in this RFC grows naturally from the original implementation.
  5. It might be a bit different if we had something like lib/pse52. The exact difference would be that this RFC would start with proposal to rename it lib/posix. But per p.2, it was done in future-proof way from the start, so there's nothing to worry about here.

Location of the headers

Another question raised during pre-discussion of this RFC was location of the new POSIX headers added. The previous section should provide pretty obvious and natural response: the existing POSIX subsystem has headers in include/posix, thus any extension to it would also have headers in include/posix.

There was speculation that (some?) new headers might be put directly in include/. It would be quite inconsistent and unsustainable to have some POSIX headers under one path, while other under another (also 3rd, 4th, etc?). An example was given based on a particular header which was in a subdirectory: include/posix/arpa/inet.h (re: https://github.com/zephyrproject-rtos/zephyr/pull/16621/files), speculating that it as well could go into include/arpa/inet.h. But that's a very peculiar example. There're many POSIX headers, and majority of them go into the top-level include directory, not a subdirectory like arpa/ above. Again, we don't want to make confusing rules of what goes where, based on such a kind of criteria. This won't be sustainable, will lead to mistakes, conflicts, duplication, etc.

The interesting implication question is however whether using the include/posix/ location as was done originally was a good move, and whether it would make sense to revisit it now.

Generally, the header space should be structured and ordered. More specifically, there should be proper namespaces of native Zephyr headers vs POSIX headers. Failing that, there will be confusion and conflicts, again. We actually had example of that, so such claims come from the actual experience: b4b108d .

The idea situation would be that Zephyr native headers would be namespaced, e.g. located in include/zephyr/ and included as #include <zephyr/...>. While POSIX headers would be at their natural locations mandated by the standard. This would 100% resolve any risk of namespace conflicts (including with other 3rd-party projects). Unfortunately, recently Zephyr TSC discussed this question and made a decision to not move Zephyr headers under zephyr/, so for the mid-term (2-3 years), we're locked out from the opportunity to resolve this issue completely, until further experience and leveraging Zephyr in real-world conditions might prompt another iteration of handling this matter.

Then, in the current conditions, sticking with existing include/posix/ makes good sense - it's already tested and tried solution, which doesn't give as strong non-conflict guarantees as the described above, but still provides bare-minimum required namespacing separation. While this solution requires some overheads in managing include paths, and indeed, some recent elaborations of that aren't merged yet (#15937), at least it's by now well understood that these elaborations are required and how to have done them.

Retrospective: Existing 3rd-party applications porting projects

  • 2017-08 by @pfalcon (author of the RFC): MicroPython socket module. MicroPython was actually a testbed for prototyping Zephyr's BSD Socket(-like) API. As such, it was largely and initially a from-scratch project, so POSIX compatibility matters became apparent only later, when turned out that MicroPython had "socket" module implementation for POSIX system, and in the end had module for Zephyr, which had probably ~80% of code shared with native POSIX module, and the rest adhoc to Zephyr. That might be ok for MicroPython, due to its goal of detailed efficiency, but already raised concerns that such an approach won't be scalable to porting many applications.
  • 2019-01 by @pfalcon: Porting of OPC UA protocol open62541 library. This is a true 3rd-party POSIX API project, which showed a lot of deficiencies in BSD Socket vs POSIX vs libc integration. The work with completed, but with enormous time spent in debugging build conflicts and with some dirty, umergeable workarounds. The majority (dozen+) of patches were however cleaned up and merged into Zephyr, large chunk for 1.14. Until suddenly, the next chunk of patches (posix: Add headers related to BSD Sockets API #16621, posix: Clean up various headers #16626, lib: posix: Switch to use zephyr_interface_library_named cmake directive #15937) following the very same process as before didn't catch some blocking attention which led to the creation of this RFC.
  • 2019-05 by @pfalcon: Porting/elaboration of Google Cloud Platform IoT Embedded SDK (further called GoogleIoT). GoogleIoT is described as supporting Zephyr, but actually builds only for BOARD=native_posix, and while doing so, includes host-side POSIX headers. It's POSIX-based otherwise. Switching it to proper Zephyr POSIX subsystem showed all the issues familiar from the open62541 porting work, which served as another motivation to revive, cleanup, and submit patches done while working on it (which now also apply to GoogleIoT work).
  • 2019-05 by @PiotrZierhoffer, @tgorochowik, et al: Porting of civitweb to Zephyr ([RFC] Missing parts of libc required for CivetWeb #16683, Add civetweb HTTP sample #17019). The developers of the port immediately faced the POSIX vs newlib conflicts as were described above, starting with open62541 port. Instead of trying to seek to resolve them, they decided to use minimal libc instead. Of course, it is far from POSIX compliance itself, prompting developers to import missing functionality from other libc projects, like musl. All leading to concerns of maintainability of the mixups of pieces of different flavors of libc's, instead of working on elaborating the existing one (newlib). This again served as a motivation to cleanup and submit previous patches resolve POSIX subsys vs newlib compatibility.

Immediate scope of work

  • Making sure that Zephyr BSD Sockets subsys and Zephyr POSIX subsys properly integrate.
  • Making sure Zephyr POSIX subsys and newlib, the official Zephyr's "full libc", properly integrate.
@pfalcon pfalcon added RFC Request For Comments: want input from the community area: POSIX POSIX API Library labels Jul 22, 2019
@pfalcon
Copy link
Collaborator Author

pfalcon commented Jul 22, 2019

This is the promised RFC which was discussed at the recent Networking Forum(s) and Dev meeting(s).

cc: @nashif, @mbolivar, @jukkar, @MaureenHelm, @galak, @PiotrZierhoffer, @tgorochowik, @dleach02.

@pfalcon
Copy link
Collaborator Author

pfalcon commented Jul 22, 2019

@galak, Please help me to lead this to TSC/Dev meetings as required to review/discuss this RFC. I'm especially concerned, as I'm on vacation next week, and then there's very little time to merge the long-pending patches for 2.0. As the RFC argues in the pertinent section, all the available patches follow the same course of development as already took place last 6-12 month, including 1.14 release preparation. Thanks.

@pfalcon pfalcon added dev-review To be discussed in dev-review meeting TSC Topics that need TSC discussion labels Jul 22, 2019
@nashif
Copy link
Member

nashif commented Jul 25, 2019

I have lots of issues with this so-called RFC which advocates creating a special development process for the posix subsystems, but lets start with some sections:

Location of the headers

For the headers, include/posix is not ideal and conflict with our definition of what include/ should contain (Zephyr public APIs), for posix (and other abstractions) there are at least 2 options:

  1. go the nuttx (and others) route and declare ourselves a posix OS and put everything in include/. An application which includes sys/socket.h will just work.
  2. Move include/posix out of include completely and have it where the posix subsystem code is maintained and let applications add this path when they are building a posix compliant application. (afaik freertos does that)

I am leaning towards option (2), we should do the same with cmsis rtos APIs and any other abstractions in the future.

I'm especially concerned, as I'm on vacation next week, and then there's very little time to merge the long-pending patches for 2.0.

We should not rush into getting this into 2.0, IMO this item is big enough to be a 2.1 item and I do not see us solving all issues with POSIX for 2.0 and it was never on the roadmap anyways. So lets tread slowly and get this right for 2.1.

@pfalcon
Copy link
Collaborator Author

pfalcon commented Feb 20, 2020

Recently, there were a few similar and related questions I commented on, which I'd like to summarize here for future reference (this summary should be pretty compatible with, and entail from, the original RFC above). The kind of questions being talked about is along the lines of "Why Zephyr POSIX subsystem includes function fun1(), it's not POSIX function, it's BSD (or similar) function" and "Function fun2() [taken from Linux or similar] is not in POSIX, so why do we talk POSIX subsystem here?".

To answer these, I'd like to remind that there're 2 meaning of word POSIX:

  1. POSIX xxxx.y-zz is a specific standard as issued by the standards body
  2. "POSIX" in informal sense is a standard which defines well-known, decades, proven, largely vendor-netral API sets, for which a lot of real-world software is written.

This whole RFC advocates and emphasizes importance of point 2. That is, purpose of Zephyr POSIX subsystem isn't just to put a badge by the Zephyr name "conforms to POSIX xxxx.y-zz", but also to enabling porting and reusing multitude of real-world software and knowledge to Zephyr. In this regard, besides purely standard-defines APIs, there's a kind of extended API, which can be called "extended POSIX", which perhaps can be subdivided to:

a) "sub-POSIX", or well-known functions which didn't get into the POSIX standard(s) for various reasons. One of known reasons is that the POSIX standard is known to have affinity for SystemV family of origin Unix, omitting support for BSD-derived functionality. That doesn't mean it doesn't exist or useless. As long as it's useful to port/be compatible with existing software, we could add BSD-compatible functions.
b) "super-POSIX". (Unix-like) portable operating system continue to evolve beyond bounds specified in the currently published POSIX standards. BSD family, Solaris, Linux add functionality missing in classic POSIX systems. Of these, Linux is very popular overall OS, and an obvious affinity choice for Zephyr to follow. Let's put this way: if the POSIX authority would make a call to standardize newer API sets, Linux choices would be among the strong contenders for standardization. (But they're obvious some decade too young to be considered for standardization).

Hopefully with the above detailed outline, specific questions can be answered:

Q: Why Zephyr POSIX subsystem includes some BSD-heritage functions which aren't in POSIX?
A: Because some real-world software was ported to Zephyr which required them. One big component is Newlib, which is used by Zephyr to implement great deal of POSIX and ANSI C functionality. It include some BSD-compatible functions at the lowest level of its functionality (e.g. some POSIX functions may be implemented in terms of or with such BSD functions). At the same time, Newlib is not the only libc Zephyr supports, e.g. it has minlibc. Experience showed that to avoid cascade of build errors switching minlibc to Newlib and vice-versa, the foundation of minlibc should be aligned with Newlib implementation.

Q: It would be nice to add some function from Linux to work with other Zephyr POSIX subsystem components (file descriptors, sockets, poll, etc.). In which way such a Linux function should be added?
A: Almost certainly as a part of (extended) Zephyr POSIX subsystem. The description of the function from Linux manpages (or other documentation sources) should be treated as normative reference. It's OK to implement subset of functionality, but overall interface and behavior should be consistent. As an example, function signatures should match exactly. But it's ok to ignore some parameter which are (currently) not implemented for Zephyr. It's recommended to be as thorough as to write samples/tests which can be built (from the same source) on bother Linux and Zephyr, and behave the same way.

@galak galak removed the dev-review To be discussed in dev-review meeting label Mar 26, 2020
@nashif nashif removed the TSC Topics that need TSC discussion label Jun 16, 2021
@nashif nashif moved this to Review in RFC Backlog Apr 13, 2023
@nashif
Copy link
Member

nashif commented Apr 13, 2023

@cfriedt can you please take a look and close if this is already covered in other RFCs?

@cfriedt
Copy link
Member

cfriedt commented Apr 13, 2023

@nashif - this is great - thanks for finding these slightly older posix issues. You've helped me find the specific standards that we should be implementing for the embedded profile :-)

https://ieeexplore.ieee.org/document/1342418

I think this rfc is still relevant in terms of content. The direction here is a bit vague but I think it agrees with the current one. I'll close this issue but add a reference to the POSIX LTSv3 Roadmap.

@cfriedt cfriedt closed this as completed Apr 13, 2023
@github-project-automation github-project-automation bot moved this from Review to Done in RFC Backlog Apr 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: POSIX POSIX API Library RFC Request For Comments: want input from the community
Projects
Status: Done
Development

No branches or pull requests

4 participants