-
Notifications
You must be signed in to change notification settings - Fork 7.3k
native: Added arch,soc & board to run Zephyr natively in a POSIX OS #4174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
2fd7fb3
to
c927649
Compare
just noticed that when building the synchronisation sample, it is going really fast where it should be sleeping for half a second, any idea what is going on? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand this is WIP code but a few things stand out:
-
Coding style. We have a particular coding style for Zephyr that this patch doesn't follow at all. https://www.zephyrproject.org/doc/contribute/contribute_guidelines.html#coding-style Need you to fix this ASAP, and verify that it is in compliance using checkpatch,pl
-
The complete lack of interrupts is very concerning. This basically means that all threads become cooperative, and irq_lock() is a no-op. It seems to me that it will be quite easy to write code that works well in this particular environment, but will not work at all on real HW due to concurrency problems. I would really suggest trying a different approach here, possibly something that uses POSIX signals to emulate timer interrupts.
-
Sanitycheck integration. The way you are going to prove that this works is by showing that all of our unit tests pass at runtime using the sanitycheck tool. On other arches it does this by running them under QEMU. I'd like to see support added to sanitycheck to run all the test cases in posix environment.
Description says:
I wonder if that's it ;-). |
The feature is absolutely great, +1 for having it. I didn't look at the code yet though. |
I will read the code more carefully, but I agree with @pfalcon: this is indeed a welcome feature; thanks for working on it. From a quick glance, though, I'd like to add to comments made already. I suggest steering clear of signals to simulate interrupts. Being asynchronous, their handling is tricky and very prone to headaches. It might be cleaner to just have a thread that blocks on a read to a socketpair or a pipe and call the appropriate ISRs (this would simulate better an interrupt context, anyway). In fact, it might be a good idea to have a main loop of sorts, to handle things like timers (even the system timer), interrupts, and power management; this will make the port a lot cleaner. Supporting something like a tickless kernel would be pretty easy. Writing a main loop that is portable can be quite tricky, so I suggest using a library such as libuv or libevent. |
My thought process here was that indeed signals are tricky, but so are real interrupt handlers, both can be delivered at any time unless specifically suppressed and code needs to be careful to properly lock around them in critical sections. I'm not immediately seeing how doing this in the context of a dedicated thread would be a better simulation, especially if we are trying to simulate nested interrupts. |
+1 for @andrewboie's comment - signals are indeed async, and that's why they're closest thing to the real IRQs on POSIX. |
Thanks for looking into this. This solution is based on some assumptions about what it is that we try to do, The arch only emulates the thread creation and context switching. It is based on POSIX threads. The next thing, and this is the first big assumption is in the SOC: The next thing is the board, which assumes you do not want in any way to link the simulated time to real time. [@nashif That is why half a simulated second passes very fast in real time]. The board provides all HW models, including IRQ controller, and therefore also the IRQ handling. |
Now the background for these assumptions == What it is that we try and don't try to do with a soc/board like this.
The aim is NOT to debug HW/SW races, missed HW programming deadlines, or cases in which an interrupt comes when you did not expect it. I think, the proper way to debug that would be with a cycle accurate ISS (or a dev board). Philosophical background: |
One comment more, to clarify. If you want to slow down things to run in real time, because you want to, say, talk with a real server and have real timeouts, all you need to do is |
[Update 2017/12/04] : comment obsolete. See above |
I see where you're coming from but it still doesn't sit well with me that all threads which would normally be preemptible are forced to be non-preemptible under this target. I suppose we'll have to see how things look when we start running test cases under sanitycheck. I have a feeling a lot of stuff is not going to work due to the assumptions you have made about timing (or complete lack thereof) The coding style issues are a dealbreaker and must be addressed. |
Thanks @andrewboie, Regarding the assumption of timing, if you can think of some particular case which could cause trouble, it could be useful to start looking into it. Would it help, to imagine this target as if it were a real development board which has an outrageously high CPU clock compared to the final target, but that people likes using because it provides very nice debugging capabilities and costs nothing? |
Get sanitycheck working with this port.
Not really, since if I understand this port correctly, something like k_sleep(500) will return immediately and not 500 milliseconds later. Even very fast CPUs can be tuned such that timer APIs work as expected. Unless you've found a way to ensure if thread A sleeps for 500ms and thread B sleeps for 250ms, thread B wakes up first, that sort of thing. |
To the best of my understanding k_sleep() works as it is supposed to. You can just try the sync app with the latest commit I made. |
yes, works better now with latest commit. |
Here's my 2 cents about this after taking a quick glance at the code and discussing with @aescolar-ot offline. I believe we need to distinguish exactly what we want to achieve with a "native" or "POSIX" port. In my eyes what developers and users expect from a native port is the following:
Additionally, the fast turnaround time and simplified development when developing, in the future, display-enabled applications where the GUI is part of the app itself (i.e. integration with SDL or similar in the future) On the other hand, @aescolar-ot has another set of goals in mind, a bit different from the ones above:
Please correct me if I'm wrong, @aescolar-ot so we can all be on the same page. But assuming what I stated is correct, then I think we have 2 ways to move forward: a) Create 2 separate archs or ports: a "native" one that corresponds to the Zephyr users' expectations and a "sim" one that is used for high-speed simulation |
Thanks @carlescufi. I agree that there seems to be some misunderstanding about what the aim of the port should be. Different people can have different objectives in mind. Let me mention that I do share objectives 1) ,2) and fast turnaround, in your first list and a couple more: 5) ability to regression test fast ; 6) fully controlled integration tests (with "external" test code which can monitor and modify the app and OS state/variables); 7) fully reproducible tests ; 8) ability to debug and instrument without affecting the result (e.g. ability to freeze the whole universe in a breakpoint); 9) ability to plug to external tools which may be able to run much faster or much slower than real time. Regarding Carles 4th point, that is what QEMU is partially meant to provide (you run the same image, with the same memory layout, but do on the fly "translation" to the native instruction set. And there is no intent to provide timing realism apart from the feeling given by the system time running at ≈ real time). So you can say that the spectrum is roughly like: Please, note that I target speeds up of more than x100, although as proven by the latest commit, there is nothing preventing slowing down things, so you can have a HW model/driver which actually interfaces with a real ethernet card, socket or similar. Another note: If a low priority thread unlocks a mutex a higher priority thread is waiting for, you will have a context switch. Threads are still kicked out like in the real target. The difference is that due to the "speed" of the CPU, HW interrupts come while in the idle loop. |
I'll be happy if I can run Zephyr under rr, Valgrind, or built with AddressSanitizer or UndefinedBehaviorSanitizer. I've debugged quite a few parts of Zephyr by taking the code out, building it for Linux, and using these things to find my way out. Other things that could be used with this port are tools like afl or libfuzzer, that are helpful to find those bugs that a test suite alone are unable to. Being able to use these tools without bending the world would be an invaluable debugging aid. So much so that, for me, this trumps every other requirement (such as simulating nested interrupts, or anything resembling "real hardware" too much). |
@aescolar-ot what is the status with this PR? Do you need any help? |
@nashif Some help would be very welcome. It seems there is a bit of work ahead before this could be merged (please correct me or add as needed): Regarding 1. and 2. I'm thinking that any of you would be able to do it much faster than me, as you know the setup while for me that would be an up hill reverse engineering task. Would any of you be willing to take 1. and maybe also 2.? |
@aescolar-ot : Let me try to chime in, mostly just focusing your attention on what was said before. In your list above, p.0 should be:
That would help more people to look at your patch (so far what one immediately sees is codestyle violation in every 2nd line, then there's https://en.wikipedia.org/wiki/Fail-fast approach). |
You can speed this up by running your code through uncrustify :) install uncrustify |
@pfalcon thanks. Anyhow, now checkpatch is happier. Just note that:
@nashif Thanks :) |
@pfalcon Is the code now good enough to review? :) |
This adds support of 'make run' to the native port allowing us to run applications natively on the host instead of qemu. Signed-off-by: Anas Nashif <[email protected]>
All runner logic was implemented in qemu.cmake, remove the generic stuff and make qemu.cmake qemu specific. Signed-off-by: Anas Nashif <[email protected]>
To indicate the generated binary is executable on the host, add .exe extension to the generated ELF file. Signed-off-by: Anas Nashif <[email protected]>
Native port now adds .exe to the generated ELF Signed-off-by: Alberto Escolar Piedras <[email protected]> Signed-off-by: Anas Nashif <[email protected]>
added missing define for posix arch Signed-off-by: Alberto Escolar Piedras <[email protected]>
benchmark/app_kernel test was giving a float exception if the operations were performed faster than the system timer resolution. Added a safety macro in all divisions to avoid the fault Signed-off-by: Alberto Escolar Piedras <[email protected]>
With the native port we are able to generate coverage reports, add the needed options to the compliler and add a kconfig option to enable this on the supported architectures. Signed-off-by: Anas Nashif <[email protected]>
allow to compile with posix arch Signed-off-by: Alberto Escolar Piedras <[email protected]> Signed-off-by: Anas Nashif <[email protected]>
For the POSIX arch we rely on the native OS to handle segfaults, and stack overflows. So that we can debug them with normal native tools. Therefore these 2 are ifdef'ed for this arch in this test Signed-off-by: Alberto Escolar Piedras <[email protected]>
terminate process as soon as the testcase is done Signed-off-by: Alberto Escolar Piedras <[email protected]> Signed-off-by: Anas Nashif <[email protected]>
POSIX arch is not limited to 200 chars in sprintf Signed-off-by: Alberto Escolar Piedras <[email protected]>
replaced manual busy wait loop in test with k_busy_wait() Signed-off-by: Alberto Escolar Piedras <[email protected]>
fix in busy waits in test/kernel/common for the POSIX arch Signed-off-by: Alberto Escolar Piedras <[email protected]>
tests/kernel/tickless/tickless_concept fix in infinite wait loops for POSIX ARCH Signed-off-by: Alberto Escolar Piedras <[email protected]>
Signed-off-by: Alberto Escolar Piedras <[email protected]>
test/timer/timer_api use k_busy_wait to implement the tests' busy_wait_ms, for archs which require a different type of busy waiting Signed-off-by: Alberto Escolar Piedras <[email protected]>
Added small delay in each iteration of the critical_loop loop for the posix arch: For this arch this loop and critical_rtn would otherwise run in 0 time and therefore the test would never finish. Signed-off-by: Alberto Escolar Piedras <[email protected]> Signed-off-by: Anas Nashif <[email protected]>
A couple of infinite wait loops fixed for posix arch Signed-off-by: Alberto Escolar Piedras <[email protected]>
The following 3 testcases are blacklisted for the POSIX arch / simple_process BOARD: * tests/drivers/ipm : won't compile due to missing __stdout_hook_install() [part of minimal libc] (POSIX arch uses the native libc) * tests/kernel/mem_protect/stackprot : will crash "natively" when trying to corrupt the stack and therefore will fail the testcase. The current understanding is that the POSIX arch should let the native OS handle faults, so they can be debugged with the native tools. * samples/cpp_synchronization : it is not possible to build cpp code yet on top of the posix arch Signed-off-by: Alberto Escolar Piedras <[email protected]>
We are getting warnings when building with native port: used with length equal to number of elements without multiplication by element size This fixes the call of memset. Signed-off-by: Anas Nashif <[email protected]>
We can run the benchmarks on other architectures beside x86 Signed-off-by: Anas Nashif <[email protected]>
search for .exe file when running native applications in sanitycheck. Signed-off-by: Anas Nashif <[email protected]>
Support running native applications in sanitycheck using a new handler. Signed-off-by: Anas Nashif <[email protected]>
Signed-off-by: Anas Nashif <[email protected]> Signed-off-by: Alberto Escolar Piedras <[email protected]>
Signed-off-by: Anas Nashif <[email protected]>
06d743c
to
f889dbb
Compare
Proof of concept of a native port of Zephyr to run on a POSIX OS.
It includes:
A new arch (posix) which relies on pthreads to emulate the context switching
A new soc for it (inf_clock) which emulates a CPU running at an infinely high
clock (so when the CPU is awaken it runs till completion in 0 time).
A new board, which provides a trivial system tick timer and irq generation, to be able to run as a standalone console program.
Note that this does not provide Kconfig and Makefiles to integrate with thedefault built system.
All this is work in progress.
The garbage/ folder is not meant to be merged ever (please ignore it)Origin: Original
Fixes #1891
Signed-off-by: Alberto Escolar Piedras [email protected]
I haven't dug in the build system, so you cannot configure it to build thru the normal build flow. But I append a trivial Makefile in the garbage folder you can use to build it. If you want to try it, in the repo root, runexport COMPILE_OUTPUT_DIR=foobar/
make -f garbage/Makefile compile
that will build the philosophers app into foobar/zephyr_posix_exe
you can then run that binary directly (or in gdb if you want).
If you want to build another sample app, modify garbage/c_files_list accordingly
Please note that all of it is quite hacky at this point. The aim of this pull request is to start a discussion about the fitness of the approach, and to gather comments (if you have many, please focus on the most important matters and ignore typos and similar)
[updated 2017/11/28] : see nashif's comment below for how to build. Note that now sanitycheck can also be used:
sanitycheck -p simple_process