Skip to content

native: Added arch,soc & board to run Zephyr natively in a POSIX OS #4174

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 34 commits into from
Dec 27, 2017

Conversation

aescolar
Copy link
Member

@aescolar aescolar commented Oct 4, 2017

Proof of concept of a native port of Zephyr to run on a POSIX OS.

It includes:
A new arch (posix) which relies on pthreads to emulate the context switching
A new soc for it (inf_clock) which emulates a CPU running at an infinely high
clock (so when the CPU is awaken it runs till completion in 0 time).
A new board, which provides a trivial system tick timer and irq generation, to be able to run as a standalone console program.

Note that this does not provide Kconfig and Makefiles to integrate with the
default built system.

All this is work in progress.
The garbage/ folder is not meant to be merged ever (please ignore it)

Origin: Original

Fixes #1891

Signed-off-by: Alberto Escolar Piedras [email protected]

I haven't dug in the build system, so you cannot configure it to build thru the normal build flow. But I append a trivial Makefile in the garbage folder you can use to build it. If you want to try it, in the repo root, run
export COMPILE_OUTPUT_DIR=foobar/
make -f garbage/Makefile compile
that will build the philosophers app into foobar/zephyr_posix_exe
you can then run that binary directly (or in gdb if you want).
If you want to build another sample app, modify garbage/c_files_list accordingly

Please note that all of it is quite hacky at this point. The aim of this pull request is to start a discussion about the fitness of the approach, and to gather comments (if you have many, please focus on the most important matters and ignore typos and similar)

[updated 2017/11/28] : see nashif's comment below for how to build. Note that now sanitycheck can also be used: sanitycheck -p simple_process

@nashif
Copy link
Member

nashif commented Oct 4, 2017

just noticed that when building the synchronisation sample, it is going really fast where it should be sleeping for half a second, any idea what is going on?

Copy link
Contributor

@andrewboie andrewboie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand this is WIP code but a few things stand out:

  1. Coding style. We have a particular coding style for Zephyr that this patch doesn't follow at all. https://www.zephyrproject.org/doc/contribute/contribute_guidelines.html#coding-style Need you to fix this ASAP, and verify that it is in compliance using checkpatch,pl

  2. The complete lack of interrupts is very concerning. This basically means that all threads become cooperative, and irq_lock() is a no-op. It seems to me that it will be quite easy to write code that works well in this particular environment, but will not work at all on real HW due to concurrency problems. I would really suggest trying a different approach here, possibly something that uses POSIX signals to emulate timer interrupts.

  3. Sanitycheck integration. The way you are going to prove that this works is by showing that all of our unit tests pass at runtime using the sanitycheck tool. On other arches it does this by running them under QEMU. I'd like to see support added to sanitycheck to run all the test cases in posix environment.

@pfalcon
Copy link
Collaborator

pfalcon commented Oct 4, 2017

it is going really fast where it should be sleeping for half a second, any idea what is going on?

Description says:

emulates a CPU running at an infinely high clock

I wonder if that's it ;-).

@pfalcon
Copy link
Collaborator

pfalcon commented Oct 4, 2017

The feature is absolutely great, +1 for having it. I didn't look at the code yet though.

@lpereira
Copy link
Collaborator

lpereira commented Oct 4, 2017

I will read the code more carefully, but I agree with @pfalcon: this is indeed a welcome feature; thanks for working on it.

From a quick glance, though, I'd like to add to comments made already.

I suggest steering clear of signals to simulate interrupts. Being asynchronous, their handling is tricky and very prone to headaches. It might be cleaner to just have a thread that blocks on a read to a socketpair or a pipe and call the appropriate ISRs (this would simulate better an interrupt context, anyway).

In fact, it might be a good idea to have a main loop of sorts, to handle things like timers (even the system timer), interrupts, and power management; this will make the port a lot cleaner. Supporting something like a tickless kernel would be pretty easy. Writing a main loop that is portable can be quite tricky, so I suggest using a library such as libuv or libevent.

@andrewboie
Copy link
Contributor

andrewboie commented Oct 4, 2017

I suggest steering clear of signals to simulate interrupts. Being asynchronous, their handling is tricky and very prone to headaches. It might be cleaner to just have a thread that blocks on a read to a socketpair or a pipe and call the appropriate ISRs (this would simulate better an interrupt context, anyway)

My thought process here was that indeed signals are tricky, but so are real interrupt handlers, both can be delivered at any time unless specifically suppressed and code needs to be careful to properly lock around them in critical sections. I'm not immediately seeing how doing this in the context of a dedicated thread would be a better simulation, especially if we are trying to simulate nested interrupts.

@pfalcon
Copy link
Collaborator

pfalcon commented Oct 4, 2017

+1 for @andrewboie's comment - signals are indeed async, and that's why they're closest thing to the real IRQs on POSIX.

@aescolar
Copy link
Member Author

aescolar commented Oct 5, 2017

Thanks for looking into this.
Let me describe a bit better what is done and why:

This solution is based on some assumptions about what it is that we try to do,
but it is layered so that one can reuse the arch or soc if he shares their assumptions. One can build a different soc or board if he doesn't share the next ones.
The division in arch/soc/board is not really trying to match what would be different in real SOC/boards but these layers of assumptions.

The arch only emulates the thread creation and context switching. It is based on POSIX threads.
It works by spawning 1 pthread per zephyr thread, and keeps all blocked but one to emulate the fact that Zephyr is meant for a single thread CPU.

The next thing, and this is the first big assumption is in the SOC:
The CPU runs at an infinetitely high clock. So each time the cpu awakes, it runs till completion before the HW moves in time at all (even a "delta" cycle).
The SOC part provides the CPU "booting", awaking and halting (the IRQ enabling disabling is meant to be a pass thru to the board).
NOTE: There is interrupts. In this quick proof of concept (because the board only provides 1 IRQ, there wasn't yet a need to mask/enable disable them. [please ignore the comments inside _arch_irq_lock/unlock they are missleading]).
But by the inf cpu clock assumption, an IRQ cannot come while the CPU is awake.

[updated 2017/11/28]: With the current irq controller in the board, SW Interrupts can come while the SW is executing, and enabling/unlocking HW interrupts will cause them to trigger depending on priority

The next thing is the board, which assumes you do not want in any way to link the simulated time to real time. [@nashif That is why half a simulated second passes very fast in real time]. The board provides all HW models, including IRQ controller, and therefore also the IRQ handling.
The board would be the part one would always replace completely when wanting to run Zephyr in/with a simulation framework.
The one provided is almost the most trivial/barebones thing which is enough to run those sample apps. It just provides the printk hook, a system timer/ticker and a trivial irq controller & handler.

@aescolar
Copy link
Member Author

aescolar commented Oct 5, 2017

Now the background for these assumptions == What it is that we try and don't try to do with a soc/board like this.
(Note that I start now talking about all assumptions together; But remember that if for some use case you do not share some, you only need to add a new board or soc).

  • The main target of this is fast functional regression testing, simulations in which we have the full functionality of the code, and functional debugging of the code. [*1]
  • We want to run tests fast (as in several minutes of simulated time per wall time second)
  • We want full reproducibility between runs. Therefore any "randomness" must be controlled.
  • We want to be able to run in a debugger and get exactly the same results. It should not matter if you stop in a breakpoint, go to a meeting, and 1 hour later continue where you left it. [*2]
  • We do not try to use real host peripherals, but simulated HW or HW models.
  • We do not try to replace cycle accurate instruction set simulators (ISS), dev boards, or QEMU, but to complement them. We want to allow to do *1 and *2 easily and fast.

The aim is NOT to debug HW/SW races, missed HW programming deadlines, or cases in which an interrupt comes when you did not expect it. I think, the proper way to debug that would be with a cycle accurate ISS (or a dev board).
Note that as we compile natively to x86, we dont have any way of knowing how long time that code would have taken to execute in the real target.
Note also that a workstation CPU runs at 3+GHz, and the native kernel time slices feel like an eternity for our embedded code.
[If you let HW models run in one thread simultaneously with the app/os code in another thread, and use an asynchronous mechanism to throw something from one to the other you will see things like a few seconds of simulated time pass before the kernel/app reacts to a time critical interrupt. Moreover, it will depend greatly on the machine you use and its load]

Philosophical background:
Lot's of bugs are functional bugs. Specially the higher you go up in the application/network stack, your bugs are more functional bugs and less related to the exact moment an interrupt comes.
Most networking protocol bugs will not depend on exactly when an interrupt comes but about the content of a packet or when did it comes relative to some protocol hanshake or FSM state.
We have been using this kind of native port for quite some time in another OS, and we are very happy with what it provides us. For ex. catching a NULL pointer de-reference is a few seconds matter (just run the process in Valgrind); Adding coverage is trivial; Reproducing a bug somebody else caught in his setup is just a matter of running the same thing he did (it saves a lot of pain in cases like "this crash happens every 100 hours in average, it seems to happen around here, let's add a few traces more and if we are lucky in 1 week we know more", when the cause is a functional issue [say a server throws you a disconnect when you did not expect it]); etc.

@aescolar
Copy link
Member Author

aescolar commented Oct 5, 2017

One comment more, to clarify. If you want to slow down things to run in real time, because you want to, say, talk with a real server and have real timeouts, all you need to do is have a different board with a minor difference.
In here
https://github.com/aescolar-ot/zephyr/blob/c927649cf26e783b492541a582019c09afeeb52d/boards/posix/simple_process/hw_models.c#L22
(or wherever your HW model would advance time), you would do a nanosleep() of the difference between the time you wanted to wait, and the amount of time you actually spent since the last time you advanced time.

[Update 2017/12/04] Select the SIMPLE_PROCESS_SLOWDOWN_TO_REAL_TIME board config option

@aescolar
Copy link
Member Author

aescolar commented Oct 5, 2017

[Update 2017/12/04] : comment obsolete. See above
@nashif @carlescufi : Now the simple_process board will, by default, run in real time.
You can set it to run as before by defining PULL_THE_HANDBRAKE to 0 in
https://github.com/aescolar-ot/zephyr/blob/e44029b62bd5d6f338391f9581fd1967a1f848c6/boards/posix/simple_process/hw_models.c#L16

@andrewboie
Copy link
Contributor

I see where you're coming from but it still doesn't sit well with me that all threads which would normally be preemptible are forced to be non-preemptible under this target.

I suppose we'll have to see how things look when we start running test cases under sanitycheck. I have a feeling a lot of stuff is not going to work due to the assumptions you have made about timing (or complete lack thereof)

The coding style issues are a dealbreaker and must be addressed.

@aescolar
Copy link
Member Author

aescolar commented Oct 5, 2017

Thanks @andrewboie,
Regarding the coding style: that is the easiest thing to fix [as I commented all this started and still kind of is a proof of concept, so there is quite a few shortcuts. Not setting my editor to a different style was one of them :)]

Regarding the assumption of timing, if you can think of some particular case which could cause trouble, it could be useful to start looking into it.

Would it help, to imagine this target as if it were a real development board which has an outrageously high CPU clock compared to the final target, but that people likes using because it provides very nice debugging capabilities and costs nothing?
In such a target, similarly (almost) all interrupts would come when the app/os has gone to idle. So you would have the same issue with no interrupts context switches in threads. The threads are still pre-emptible. They are just lucky it does not happen.
The case in which I would expect problems is if somebody has code that assumes that N NOPS after doing something the HW will be ready, or if there is a busy wait until a register is set (but as that would be very HW dependent I expect either it won't matter because it would be in unused drivers, or that people would be able to accept an ifdef).
I saw there is a k_busy_wait(), and I guess that will require an ARCH_HAS_CUSTOM_BUSY_WAIT kind of ifdef.

@andrewboie
Copy link
Contributor

andrewboie commented Oct 5, 2017

if you can think of some particular case which could cause trouble, it could be useful to start looking into it.

Get sanitycheck working with this port.

Would it help, to imagine this target as if it were a real development board which has an outrageously high CPU clock compared to the final target, but that people likes using because it provides very nice debugging capabilities and costs nothing?

Not really, since if I understand this port correctly, something like k_sleep(500) will return immediately and not 500 milliseconds later. Even very fast CPUs can be tuned such that timer APIs work as expected. Unless you've found a way to ensure if thread A sleeps for 500ms and thread B sleeps for 250ms, thread B wakes up first, that sort of thing.

@aescolar
Copy link
Member Author

aescolar commented Oct 5, 2017

To the best of my understanding k_sleep() works as it is supposed to. You can just try the sync app with the latest commit I made.
The only reason Anas saw things happening very fast is because the first "board" was running in simulated time. Which is much faster than real time, but the app and kernel code was executing exactly the same path it executes now.

@nashif
Copy link
Member

nashif commented Oct 5, 2017

To the best of my understanding k_sleep() works as it is supposed to. You can just try the sync app with the latest commit I made.

yes, works better now with latest commit.

@carlescufi
Copy link
Member

carlescufi commented Oct 6, 2017

Here's my 2 cents about this after taking a quick glance at the code and discussing with @aescolar-ot offline. I believe we need to distinguish exactly what we want to achieve with a "native" or "POSIX" port. In my eyes what developers and users expect from a native port is the following:

  1. Develop Zephyr applications or high-level libraries (i.e. not kernel or low-level drivers) without the need for ISA emulation (i.e. QEMU) or hardware
  2. Ability to use all development tools available for desktop applications
  3. Ability to interface with external devices (in particular IP-enabled but BLE or Thread could be a possibility as well) to test the embedded application under a desktop development environment but communicating with real devices
  4. Faithful (as much as possible) replication of the execution environment that one would find in the real hardware, including timeouts, race conditions and memory but not CPU execution speed. This in particular I derive from the current batch of emulators doing the rounds in mobile application development (iOS/Android)

Additionally, the fast turnaround time and simplified development when developing, in the future, display-enabled applications where the GUI is part of the app itself (i.e. integration with SDL or similar in the future)

On the other hand, @aescolar-ot has another set of goals in mind, a bit different from the ones above:

  1. Very fast, non-realtime simulation of hardware (and in particular) network topologies. This includes the ability to "fast-forward" time or let it run freely at high speeds to simulate hours or months of execution in seconds or minutes
  2. Hardware and network-oriented simulation, without regard for the faithfulness of the firmware code execution in terms of context switches, preemption and, of course, CPU speed

Please correct me if I'm wrong, @aescolar-ot so we can all be on the same page. But assuming what I stated is correct, then I think we have 2 ways to move forward:

a) Create 2 separate archs or ports: a "native" one that corresponds to the Zephyr users' expectations and a "sim" one that is used for high-speed simulation
b) Create a single one that is configurable to behave as one or the other based on Kconfig settings

@aescolar
Copy link
Member Author

aescolar commented Oct 6, 2017

Thanks @carlescufi. I agree that there seems to be some misunderstanding about what the aim of the port should be. Different people can have different objectives in mind.
That was the reason why I layered the arch/soc/board in that way, to allow for reuse of the lower parts, and creating different soc/boards if need be.
So I was aiming close to the your b) proposal.

Let me mention that I do share objectives 1) ,2) and fast turnaround, in your first list and a couple more: 5) ability to regression test fast ; 6) fully controlled integration tests (with "external" test code which can monitor and modify the app and OS state/variables); 7) fully reproducible tests ; 8) ability to debug and instrument without affecting the result (e.g. ability to freeze the whole universe in a breakpoint); 9) ability to plug to external tools which may be able to run much faster or much slower than real time.

Regarding Carles 4th point, that is what QEMU is partially meant to provide (you run the same image, with the same memory layout, but do on the fly "translation" to the native instruction set. And there is no intent to provide timing realism apart from the feeling given by the system time running at ≈ real time).

So you can say that the spectrum is roughly like:
port_vs_qemu_vs

Please, note that I target speeds up of more than x100, although as proven by the latest commit, there is nothing preventing slowing down things, so you can have a HW model/driver which actually interfaces with a real ethernet card, socket or similar.

Another note: If a low priority thread unlocks a mutex a higher priority thread is waiting for, you will have a context switch. Threads are still kicked out like in the real target. The difference is that due to the "speed" of the CPU, HW interrupts come while in the idle loop.

@lpereira
Copy link
Collaborator

lpereira commented Oct 6, 2017

I'll be happy if I can run Zephyr under rr, Valgrind, or built with AddressSanitizer or UndefinedBehaviorSanitizer. I've debugged quite a few parts of Zephyr by taking the code out, building it for Linux, and using these things to find my way out.

Other things that could be used with this port are tools like afl or libfuzzer, that are helpful to find those bugs that a test suite alone are unable to.

Being able to use these tools without bending the world would be an invaluable debugging aid. So much so that, for me, this trumps every other requirement (such as simulating nested interrupts, or anything resembling "real hardware" too much).

@nashif
Copy link
Member

nashif commented Oct 17, 2017

@aescolar-ot what is the status with this PR? Do you need any help?

@aescolar
Copy link
Member Author

aescolar commented Oct 18, 2017

@nashif Some help would be very welcome. It seems there is a bit of work ahead before this could be merged (please correct me or add as needed):
1. Use the build system (== create kconfig and makefiles)
2. Include in sanitycheck
[Done as of 2017/11/27]
3. Debug and fix anything sanitycheck may reveal [Done as of 2017/12/01]
4. Beautify the code (refactor things a bit better, code style, silly names, etc) [Done as of 2017/12/06]

Regarding 1. and 2. I'm thinking that any of you would be able to do it much faster than me, as you know the setup while for me that would be an up hill reverse engineering task.
Regarding 3. and 4. it would be logical that it would be me.

Would any of you be willing to take 1. and maybe also 2.?

@pfalcon
Copy link
Collaborator

pfalcon commented Oct 18, 2017

@aescolar-ot : Let me try to chime in, mostly just focusing your attention on what was said before. In your list above, p.0 should be:

  1. Rework to follow the native Zephyr codestyle.

That would help more people to look at your patch (so far what one immediately sees is codestyle violation in every 2nd line, then there's https://en.wikipedia.org/wiki/Fail-fast approach).

@nashif
Copy link
Member

nashif commented Oct 18, 2017

Beautify the code (refactor things a bit better, code style, silly names, etc)

You can speed this up by running your code through uncrustify :)

install uncrustify
uncrustify --replace --no-backup -l C -c $ZEPHYR_BASE/scripts/uncrustify.conf *.c

@aescolar
Copy link
Member Author

aescolar commented Oct 18, 2017

@pfalcon thanks.
Forgive my ignorance but, why does the coding style prevent reviewing the code?

Anyhow, now checkpatch is happier. Just note that:

  • I did not clean up the code in the garbage folder as that is meant to be thrown away as soon as 1. is done)
  • I ignored NEW_TYPEDEFS : as that is also in the other archs, I assume it is not a problem
  • I also ignored PREFER_KERNEL_TYPES in the board, as the board is not really meant to see kernel types

@nashif Thanks :)

@aescolar
Copy link
Member Author

aescolar commented Oct 18, 2017

@pfalcon Is the code now good enough to review? :)
Or in other words, would you think 0. is now done enough to not prevent other work on it? (note that the code still needs more massaging as part of 4., and that the only dependencies are 1. => 2. => 3. (4 can be done in any moment)

nashif and others added 25 commits December 27, 2017 10:44
This adds support of 'make run' to the native port allowing us to run
applications natively on the host instead of qemu.

Signed-off-by: Anas Nashif <[email protected]>
All runner logic was implemented in qemu.cmake, remove the generic stuff
and make qemu.cmake qemu specific.

Signed-off-by: Anas Nashif <[email protected]>
To indicate the generated binary is executable on the host, add .exe
extension to the generated ELF file.

Signed-off-by: Anas Nashif <[email protected]>
Native port now adds .exe to the generated ELF

Signed-off-by: Alberto Escolar Piedras <[email protected]>
Signed-off-by: Anas Nashif <[email protected]>
added missing define for posix arch

Signed-off-by: Alberto Escolar Piedras <[email protected]>
benchmark/app_kernel test was giving a float exception
if the operations were performed faster than the
system timer resolution.
Added a safety macro in all divisions to avoid the fault

Signed-off-by: Alberto Escolar Piedras <[email protected]>
With the native port we are able to generate coverage reports, add the
needed options to the compliler and add a kconfig option to enable this
on the supported architectures.

Signed-off-by: Anas Nashif <[email protected]>
allow to compile with posix arch

Signed-off-by: Alberto Escolar Piedras <[email protected]>
Signed-off-by: Anas Nashif <[email protected]>
For the POSIX arch we rely on the native OS to handle
segfaults, and stack overflows.
So that we can debug them with normal native tools.
Therefore these 2 are ifdef'ed for this arch in this test

Signed-off-by: Alberto Escolar Piedras <[email protected]>
terminate process as soon as the testcase is done

Signed-off-by: Alberto Escolar Piedras <[email protected]>
Signed-off-by: Anas Nashif <[email protected]>
POSIX arch is not limited to 200 chars in sprintf

Signed-off-by: Alberto Escolar Piedras <[email protected]>
replaced manual busy wait loop in test with
k_busy_wait()

Signed-off-by: Alberto Escolar Piedras <[email protected]>
fix in busy waits in test/kernel/common for the POSIX
arch

Signed-off-by: Alberto Escolar Piedras <[email protected]>
tests/kernel/tickless/tickless_concept fix in
infinite wait loops for POSIX ARCH

Signed-off-by: Alberto Escolar Piedras <[email protected]>
Signed-off-by: Alberto Escolar Piedras <[email protected]>
test/timer/timer_api use k_busy_wait to implement the
tests' busy_wait_ms, for archs which require a different
type of busy waiting

Signed-off-by: Alberto Escolar Piedras <[email protected]>
Added small delay in each iteration of the critical_loop
loop for the posix arch:
For this arch this loop and critical_rtn would otherwise
run in 0 time and therefore the test would never finish.

Signed-off-by: Alberto Escolar Piedras <[email protected]>
Signed-off-by: Anas Nashif <[email protected]>
A couple of infinite wait loops fixed for posix arch

Signed-off-by: Alberto Escolar Piedras <[email protected]>
The following 3 testcases are blacklisted for the POSIX
arch / simple_process BOARD:
* tests/drivers/ipm : won't compile due to missing
   __stdout_hook_install()  [part of minimal libc]
  (POSIX arch uses the native libc)
* tests/kernel/mem_protect/stackprot : will crash
  "natively" when trying to corrupt the stack and therefore
  will fail the testcase. The current understanding is that
  the POSIX arch should let the native OS handle faults,
  so they can be debugged with the native tools.
* samples/cpp_synchronization : it is not possible
  to build cpp code yet on top of the posix arch

Signed-off-by: Alberto Escolar Piedras <[email protected]>
We are getting warnings when building with native port:
 used with length equal to number of elements without multiplication by
 element size

This fixes the call of memset.

Signed-off-by: Anas Nashif <[email protected]>
We can run the benchmarks on other architectures beside x86

Signed-off-by: Anas Nashif <[email protected]>
search for .exe file when running native applications in sanitycheck.

Signed-off-by: Anas Nashif <[email protected]>
Support running native applications in sanitycheck using a new handler.

Signed-off-by: Anas Nashif <[email protected]>
Signed-off-by: Anas Nashif <[email protected]>
Signed-off-by: Alberto Escolar Piedras <[email protected]>
@nashif nashif merged commit be1d409 into zephyrproject-rtos:master Dec 27, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
RFC Request For Comments: want input from the community
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants