-
Notifications
You must be signed in to change notification settings - Fork 7.4k
include: net: Fix incorrect casting of timeval on zsock_timeval. #29962
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
include: net: Fix incorrect casting of timeval on zsock_timeval. #29962
Conversation
99acc78
to
41c349d
Compare
When the original code was written, there were assumptions that: a) Zephyr is intended to be small embedded RTOS, so unneeded code should be avoided; b) the layout of zsock_timeval was specifically made to match layout of Newlib's (as bundled with Zephyr SDK) struct timeval. |
Maybe I will explain the reason why I proposed this change, to clarify situation. I met with the problem that using CONFIG_POSIX_API=y leads to the situation when timeval has tv_sec field of 8 B size and tv_usec of 4 B size, while for zsock_timeval both of them are 4 B. Because of the direct casting of one type to another, field values are not assigned in the expected way and select method is not working properly (time spent on sleeping is different from the expected). You wrote that there were few assumptions made, so is it kind of desired operation? If solution that I proposed is wrong, could you suggest how can I fix my problem? |
Perhaps, the most useful information here would be: in what config you met that situation (what libc, etc.) I don't say that this change is wrong, just that the original idea was to support CONFIG_POSIX_API with just Newlib, as otherwise there're just too many combinations (often of unknowns) to support. Of course, if people want to support more combos, why not. |
Again, not saying it's wrong, but another option would be, for the case of CONFIG_POSIX_API, to rely on the declaration of |
@pfalcon
where, by default,
Thus
Hence if you say that both structures should match in ARM+newlib scenario it's simply not the case. Should we then fix |
@Damian-Nordic: So, newlib was upgraded over this time too. Anyway, the commit message here said "wrongly assumes", and I proceeded to explain that it actually was the "baseline assumption", which is of course holds only under specific setup. How to fix it - add layers of marshalling like this patch does, or uphold the original idea of avoiding that whenever possible - is up to new maintainers of the POSIX subsys (when I was one, I clearly leaned on the side to avoid adding extra layers, whenever possible. But the lack of consensus and clear direction how to do things for the subsys is why I resigned.) |
Ok then, @aescolar could you comment on this PR, as I found that you are POSIX maintainer? |
@kkasperczyk-no, @Damian-Nordic: Generally, this patch should be fine. I'm sure it will be approved. I apologize for not doing that myself, as it literally goes sideways with the direction I tried to move the POSIX subsys in, and spent a lot of effort on that. (But neither can I encourage you to go the same way, as I myself very well know that it's full of obstacles and stumble points, vs just applying local workarounds like this.) |
@pfalcon That's fine. We appreciate your comments and we don't want to negatively impact the code quality, but we may not understand all the design principles of the POSIX layer, so it's hard for us to make the decision which way to go. Suppose that we want to follow the rule that all Zephyr structures match libc counterparts. Why isn't that relationship expressed explicitly in the code? I can see a lot of structures are assumed to be the same, but this assumption doesn't seem to be checked anywhere. Is it because we don't want inner Zephyr layers depend on the POSIX layer (i.e. we want the Socket layer to stay the same regardless of how it is used)? If so, isn't the marshaling approach that we proposed more natural? |
Because the whole POSIX subsys is "experimental" and many things about it are "RFC". There were various proposals, e.g. #17706, #13787, #13054, but I never received enough feedback on them.
Yes, that's kind of how it all started - POSIX subsys is just a layer on top of native Zephyr API. But as I delved into the subsys (and I inherited it from other folks, and trust me, it was in much worse shape then), and thought how to optimize various parts of POSIX API implementation, it became clear to me that to achieve that, Zephyr kernel would need to be tweaked to be more POSIX-friendly. But whether that should be done or not depends on what place POSIX support has in Zephyr, and even that was never answered well and clear.
In Socket layer you also can see traces of its evolution - it started as experimental layer on top of "native networking" API, and the first main goal was to show concerned people that it's not big bloat and a very viable alternative to the adhoc API. It won and became the main Zephyr networking API, but some its initial traits (including small code, etc. overhead) are there. E.g., by all means, CONFIG_NET_SOCKETS_POSIX_NAMES is a chore, and if someone says "It's a chore, let's drop it", I wouldn't argue. Myself, I'm not the best person to do that, given how much effort I spent to implement and maintain it that way already (so, it's easier for me to apply yet another workaround when it bites).
Maybe it is, but: a) it requires stack allocation for extra struct copy; b) requires dummy code to move values around. Obviously, for one case, it's nothing, but as you understand, we discuss the general approach. And my approach had always been "Nobody's going to optimize it later, so think about "small" things (multiplied in dozens/hundreds) right away." |
41c349d
to
aaa442e
Compare
@pfalcon I tried to follow your suggestions and not make local fix. Could you verify if it's what you meant? |
aaa442e
to
b48380f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I appreciate giving it a try! However, I meant more along the lines of:
Can you give that a quick try to see if it works (it should, but that's the magic of POSIX subsys that even seemingly simple changes may lead to unobvious effects) ? |
f319ece
to
74d7714
Compare
Currently zsock_timeval implementation uses long type for both tv_sec and tv_usec fields. In the select method timeval type is directly casted on zsock_timeval, but in some cases (e.g. when using CONFIG_POSIX_API and some specific libc version) both types have fields of different sizes, what may lead to assigning them incorrect values. Using types declared by the used libc in the CONFIG_POSIX_API=y case will let to keep timeval and zsock_timeval types compatible. Signed-off-by: Kamil Kasperczyk <[email protected]>
74d7714
to
c7f6229
Compare
@pfalcon sorry for slow responding, but I had some other work to do. I tried to introduce solution proposed by you and it partially worked, but there was a problem with lack od timeval declaration for older Newlib version. I declared timeval structure for the older Newlib as a workaround, because including <sys/time.h>, where this workaround is already done didn't help in every case. The problem I had was "No such file or directory <sys/time.h>" compilation error in CI for some samples (e.g. tests/posix/eventfd and tests/posix/eventfd_basic) and platforms (e.g. qemu_cortex_r5). It was quite weird, because when I compiled it locally everything was fine, so I guess that the success depends on the including order inside of the Zephyr. Current solution seems to work and build on every platform properly. |
I once again appreciate giving a try to an alternative solution. In all fairness, I didn't expect you to, or I would have said "please spend nor more than half an hour of it, if it didn't work out, can go with a simpler workaround". And well, now I see for yourself how it may be hard (in the sense of "tangled") to get POSIX subsys and libc integration "right". Looks good (as an input for next stage, where we should find better ways to deal with it), thanks! |
Currently zsock_timeval implementation uses long type for both
tv_sec and tv_usec fields. In the select method timeval type is
directly casted on zsock_timeval, but in some cases (e.g. when
using CONFIG_POSIX_API and some specific libc version) both types
have fields of different sizes, what may lead to assigning them
incorrect values.
Using types declared by the used libc in the CONFIG_POSIX_API=y
case will let to keep timeval and zsock_timeval types compatible.
Signed-off-by: Kamil Kasperczyk [email protected]