-
Notifications
You must be signed in to change notification settings - Fork 7.3k
shell: telnet: Don't close the connection on ENOBUFS error #67596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
shell: telnet: Don't close the connection on ENOBUFS error #67596
Conversation
If there's not enough networking buffers at certain moment, they might become available later. So instead of closing connection (and failing assertation) sleep and retry. This avoid the following assertion failure when there's much of data to send: net_pkt: Data buffer (1500) allocation failed. net_tcp: conn: 0x20076024 packet allocation failed, len=1460 shell_telnet: Failed to send -105, shutting down ASSERTION FAIL [err == 0] @ .../subsys/shell/shell_ops.c:416 os: r0/a1: 0x00000004 r1/a2: 0x000001a0 r2/a3: 0x00000004 os: r3/a4: 0x20044380 r12/ip: 0x00001958 r14/lr: 0x080c9027 os: xpsr: 0x41000000 os: Faulting instruction address (r15/pc): 0x0811ed26 os: >>> ZEPHYR FATAL ERROR 4: Kernel panic on CPU 0 os: Current thread: 0x20045100 (shell_telnet) os: Halting system Signed-off-by: Miika Karanki <[email protected]>
d9ff918
to
a78f44c
Compare
@rlubos and @jukkar : could you also take a look as you've been reviewing #66287 as well? I'm bit unsure is it ok as a principle to handle the buffer allocation problem this way as the root cause of the assertion failure this is addressing is running out of net buffers, but in our case this seems to solve the issue, nicely logging the buffer allocation error as well over the telnet if the shell logging is enabled. As the shell is often used as a debug/development tool, crashing due to the shell not being able to send data is bit inconvenient when assertions are enabled. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change looks ok (running out of buffers shouldn't be the reason to close the connection). However that assert at the shell level seems concerning to me, if I understand correctly it would hit as well if the peer simply closes the connection. Perhaps we should have some connection recovery mechanism at the telnet shell backend (just thinking aloud here, of course not asking to handle it in this PR).
Also at that point, probably the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. It might make sense to have some limiter here, so if the system is out-of-buffers, we do not sleep forever. Just a suggestion for future PR.
You're right @rlubos ! Simply doing ... causes the assertion failure due to
I'm also seeing another failure sometimes before the above happens:
... but I'm using Zephyr 3.4.0 here and there's lots changes since that in net_context.c so that might not be relevant anymore. Retesting with 3.5.0 requires a bit of work for me. |
I am not seeing errors with latest upstream in qemu_x86. Tried with this
and with this command |
But with
|
I created a bug report of the crash issue in #67637 |
If there's not enough networking buffers at certain moment, they might become available later. So instead of closing connection (and failing assertation) sleep and retry. This avoid the following assertion failure when there's much of data to send:
This is related to PR #66287 which implemented not closing the connection if write fails due to EAGAIN.