-
Notifications
You must be signed in to change notification settings - Fork 13.6k
lldb-server's GDB port map (--min/max-gdbserver-port) has a race condition when killing the debugee on a slow remote system #97537
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@llvm/issue-subscribers-lldb Author: David Spickett (DavidSpickett)
To reproduce, the remote needs to be something relatively slow. In my case QEMU.
Start an lldb-server on the remote in platform mode with some restricted ports:
Then connect lldb and run a program:
At this point lldb-server has started a new process to handle this client's connection and given it a port map that has one open port, that it has used to start a gdb-server process. From here, if you finish the program then run it again, everything is fine. The gdbserver is torn down and the port is freed, then reused for the new gdbserver. However, if you run before the program finishes there is a race between the platform killing the gdbserver process and the platform handling the launch gdb server request packet:
We can see that the packets are sent in the right order:
On the server side it uses
(introduced by #88845, which I think made the feature overall better, but in doing so exposed this issue) If the remote is slow enough that the kill doesn't actually finish before we get back there, then it won't and it'll try to find a free port, find none, and fail to launch the gdb server. Some ad-hoc logging shows this on the server side:
This does not happen debugging locally on real hardware. I discovered this running some of the SVE tests again. They run the debugee once to discover the supported vector lengths and then again to run the actual test case. Adding a The workaround is to not use a port map at all, but often giving full network access to a VM is difficult. So I'd like to find a way to make this work, or at least work around it in the tests most likely to be run this way. |
I have investigated this issue little bit. The variable llvm-project/lldb/tools/lldb-server/lldb-platform.cpp Lines 355 to 359 in 0870afa
It seems the condition if (!port) is never met because std::optional<uint16_t> port has a value. It is a bug and must be fixed!llvm-project/lldb/source/Plugins/Process/gdb-remote/GDBRemoteCommunicationServerPlatform.cpp Lines 160 to 169 in 0870afa
Then port = 0 is always passed to StartDebugserverProcess() called from GDBRemoteCommunicationServerPlatform::LaunchGDBServer().But StartDebugserverProcess() does not use the passed port value anyway and updates the port with a new value (child_port). llvm-project/lldb/source/Plugins/Process/gdb-remote/GDBRemoteCommunication.cpp Lines 1159 to 1171 in 76e37b1
Note m_port_map.AssociatePortWithProcess(*port, pid) will silently fail here because a new port value is missing in portmap_for_child .llvm-project/lldb/source/Plugins/Process/gdb-remote/GDBRemoteCommunicationServerPlatform.cpp Lines 209 to 221 in 7b135f7
gdbserver_portmap is useless because it does not reflect the actual port used.StartDebugserverProcess() will try to listen on 127.0.0.1:0 only if url is nullptr, but url is always defined if the protocol is tcp. Otherwise pipes are used (binding to port zero).Here is the log of lldb-server processes (parent and child) on remote Linux
Note the port I don't see where Probably it is better to revert #88845 since the port mapping does not work as expected anyway. But #88845 caused test failures on cross builds. |
TestPlatformLaunchGDBServer.py runs ldb-server w/o --min-gdbserver-port, --max-gdbserver-port or --gdbserver-port, so gdbserver_portmap is empty and gdbserver_portmap.GetNextAvailablePort() will return 0. Do not pass 0 to portmap_for_child in this case. Added few asserts in GDBRemoteCommunicationServerPlatform::PortMap to avoid this in the future. This patch fixes llvm#97537. It seems StartDebugserverProcess() ignores the port anyway and parameters --min-gdbserver-port, --max-gdbserver-port and --gdbserver-port are useless at all, but it is out of scope of this patch.
Start from They've always been a bit dodgy but they are useful for working on simulators. |
Currently the port in
Note usually it is better to configure simulators (qemu) to use a network bridge with own IP w/o any port limits and do not map ports manually to host's ports. Note I'm working on a patch to fix this issue and all port mapping related issues. The idea is to use threads for platform mode and use a common port map for all connections instead of making a new |
…98833) TestPlatformLaunchGDBServer.py runs `ldb-server` w/o parameters `--min-gdbserver-port`, `--max-gdbserver-port` or `--gdbserver-port`. So `gdbserver_portmap` is empty and `gdbserver_portmap.GetNextAvailablePort()` will return 0. Do not call `portmap_for_child.AllowPort(0)` in this case. Otherwise `portmap_for_child.GetNextAvailablePort()` will allocate and never free the port 0 and next call `portmap_for_child.GetNextAvailablePort()` will fail. Added few asserts in `GDBRemoteCommunicationServerPlatform::PortMap` to avoid such issue in the future. This patch fixes a bug added in #88845. The behaviour is very close to #97537 w/o parameters `--min-gdbserver-port`, `--max-gdbserver-port` and `--gdbserver-port`.
…98833) Summary: TestPlatformLaunchGDBServer.py runs `ldb-server` w/o parameters `--min-gdbserver-port`, `--max-gdbserver-port` or `--gdbserver-port`. So `gdbserver_portmap` is empty and `gdbserver_portmap.GetNextAvailablePort()` will return 0. Do not call `portmap_for_child.AllowPort(0)` in this case. Otherwise `portmap_for_child.GetNextAvailablePort()` will allocate and never free the port 0 and next call `portmap_for_child.GetNextAvailablePort()` will fail. Added few asserts in `GDBRemoteCommunicationServerPlatform::PortMap` to avoid such issue in the future. This patch fixes a bug added in #88845. The behaviour is very close to #97537 w/o parameters `--min-gdbserver-port`, `--max-gdbserver-port` and `--gdbserver-port`. Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: https://phabricator.intern.facebook.com/D60250877
…t mapping Removed fork(). Used threads and the common thread-safe port map for all platform connections. Updated lldb::FileSystem to use llvm::vfs::createPhysicalFileSystem() with an own virtual working directory per thread. This patch depends on llvm#100659, llvm#100666. This patch fixes llvm#97537, llvm#90923, llvm#56346. lldb-server has been tested on Windows with 50 connections and 100 processes launched simultaneously. Tested also the cross build with Linux x86_64 host and Linux Aarch64 target.
…t on Windows `lldb-server --server` works on Windows now w/o multithreading. The rest functionality remains unchanged. Added also PipeWindows::WriteWithTimeout(), fixed PipeWindows::ReadWithTimeout() and missing initialization of m_read_overlapped.hEvent in the constructor PipeWindows(lldb::pipe_t read, lldb::pipe_t write). Fixes llvm#90923, fixes llvm#56346. This is the part 1 of the replacement of llvm#100670. In the part 2 I plan to switch `lldb-server gdbserver` to use --fd and listen a common gdb port for all gdbserver connections. Then we can remove gdb port mapping to fix llvm#97537.
…t on Windows `lldb-server --server` works on Windows now w/o multithreading. The rest functionality remains unchanged. Added also PipeWindows::WriteWithTimeout(), fixed PipeWindows::ReadWithTimeout() and missing initialization of m_read_overlapped.hEvent in the constructor PipeWindows(lldb::pipe_t read, lldb::pipe_t write). Fixes llvm#90923, fixes llvm#56346. This is the part 1 of the replacement of llvm#100670. In the part 2 I plan to switch `lldb-server gdbserver` to use --fd and listen a common gdb port for all gdbserver connections. Then we can remove gdb port mapping to fix llvm#97537.
…t on Windows `lldb-server platform --server` works on Windows now w/o multithreading. The rest functionality remains unchanged. Added also PipeWindows::WriteWithTimeout(), fixed PipeWindows::ReadWithTimeout() and missing initialization of m_read_overlapped.hEvent in the constructor PipeWindows(lldb::pipe_t read, lldb::pipe_t write). Fixes llvm#90923, fixes llvm#56346. This is the part 1 of the replacement of llvm#100670. In the part 2 I plan to switch `lldb-server gdbserver` to use `--fd` and listen a common gdb port for all gdbserver connections. Then we can remove gdb port mapping to fix llvm#97537.
…t on Windows `lldb-server platform --server` works on Windows now w/o multithreading. The rest functionality remains unchanged. Added also PipeWindows::WriteWithTimeout(), fixed PipeWindows::ReadWithTimeout() and missing initialization of m_read_overlapped.hEvent in the constructor PipeWindows(lldb::pipe_t read, lldb::pipe_t write). Fixes llvm#90923, fixes llvm#56346. Depends on llvm#101326. This is the part 1 of the replacement of llvm#100670. In the part 2 I plan to switch `lldb-server gdbserver` to use `--fd` and listen a common gdb port for all gdbserver connections. Then we can remove gdb port mapping to fix llvm#97537.
…t on Windows `lldb-server platform --server` works on Windows now w/o multithreading. The rest functionality remains unchanged. Depends on llvm#101383. Fixes llvm#90923, fixes llvm#56346. This is the part 1 of the replacement of llvm#100670. In the part 2 I plan to switch `lldb-server gdbserver` to use `--fd` and listen a common gdb port for all gdbserver connections. Then we can remove gdb port mapping to fix llvm#97537.
…t mapping Removed fork(). Used threads and the common thread-safe port map for all platform connections. Updated lldb::FileSystem to use llvm::vfs::createPhysicalFileSystem() with an own virtual working directory per thread. This patch depends on llvm#100659, llvm#100666. This patch fixes llvm#97537, llvm#90923, llvm#56346. lldb-server has been tested on Windows with 50 connections and 100 processes launched simultaneously. Tested also the cross build with Linux x86_64 host and Linux Aarch64 target.
`lldb-server platform --server` works on Windows now w/o multithreading. The rest functionality remains unchanged. Fixes llvm#90923, fixes llvm#56346. This is the part 1 of the replacement of llvm#100670. In the part 2 I plan to switch `lldb-server gdbserver` to use `--fd` and listen a common gdb port for all gdbserver connections. Then we can remove gdb port mapping to fix llvm#97537.
#101283) `lldb-server platform --server` works on Windows now w/o multithreading. The rest functionality remains unchanged. Fixes #90923, fixes #56346. This is the part 1 of the replacement of #100670. In the part 2 I plan to switch `lldb-server gdbserver` to use `--fd` and listen a common gdb port for all gdbserver connections. Then we can remove gdb port mapping to fiх #97537.
Listen to gdbserver-port, accept the connection and run `lldb-server gdbserver --fd` on all platforms. Added acceptor_gdb and gdb_thread to lldb-platform.cpp SharedSocket has been moved to ConnectionFileDescriptorPosix. Parameters --min-gdbserver-port and --max-gdbserver-port are deprecated now. This is the part 2 of llvm#101283. Fixes llvm#97537.
Listen to gdbserver-port, accept the connection and run `lldb-server gdbserver --fd` on all platforms. Added acceptor_gdb and gdb_thread to lldb-platform.cpp SharedSocket has been moved to ConnectionFileDescriptorPosix. Parameters --min-gdbserver-port and --max-gdbserver-port are deprecated now. This is the part 2 of llvm#101283. Fixes llvm#97537, fixes llvm#101475.
Listen to gdbserver-port, accept the connection and run `lldb-server gdbserver --fd` on all platforms. Added acceptor_gdb and gdb_thread to lldb-platform.cpp SharedSocket has been moved to ConnectionFileDescriptorPosix. Parameters --min-gdbserver-port and --max-gdbserver-port are deprecated now. This is the part 2 of llvm#101283. Fixes llvm#97537, fixes llvm#101475.
Listen to gdbserver-port, accept the connection and run lldb-server gdbserver --fd on all platforms. Refactored Acceptor to listen on 2 ports in the main thread. Parameters --min-gdbserver-port and --max-gdbserver-port are deprecated now. This is the part 2 of llvm#101283. Fixes llvm#97537, fixes llvm#101475.
Listen to gdbserver-port, accept the connection and run lldb-server gdbserver --fd on all platforms. Refactored Acceptor to listen on 2 ports in the main thread. Parameters --min-gdbserver-port and --max-gdbserver-port are deprecated now. This is the part 2 of llvm#101283. Fixes llvm#97537, fixes llvm#101475.
Listen to gdbserver-port, accept the connection and run lldb-server gdbserver --fd on all platforms. Refactored Acceptor to listen on 2 ports in the main thread. Parameters --min-gdbserver-port and --max-gdbserver-port are deprecated now. This is the part 2 of llvm#101283. Fixes llvm#97537, fixes llvm#101475.
Listen to gdbserver-port, accept the connection and run lldb-server gdbserver --fd on all platforms. Parameters --min-gdbserver-port and --max-gdbserver-port are deprecated now. This is the part 2 of llvm#101283. Depends on llvm#106955. Fixes llvm#97537, fixes llvm#101475.
Listen to gdbserver-port, accept the connection and run lldb-server gdbserver --fd on all platforms. Parameters --min-gdbserver-port and --max-gdbserver-port are deprecated now. This is the part 2 of llvm#101283. Fixes llvm#97537, fixes llvm#101475.
Listen to gdbserver-port, accept the connection and run lldb-server gdbserver --fd on all platforms. Parameters --min-gdbserver-port and --max-gdbserver-port are deprecated now. This is the part 2 of llvm#101283. Fixes llvm#97537, fixes llvm#101475.
Listen to gdbserver-port, accept the connection and run lldb-server gdbserver --fd on all platforms. Parameters --min-gdbserver-port and --max-gdbserver-port are deprecated now. This is the part 2 of llvm#101283. Fixes llvm#97537, fixes llvm#101475.
Listen to gdbserver-port, accept the connection and run lldb-server gdbserver --fd on all platforms. Parameters --min-gdbserver-port and --max-gdbserver-port are deprecated now. This is the part 2 of llvm#101283. Fixes llvm#97537, fixes llvm#101475.
llvm#101283) `lldb-server platform --server` works on Windows now w/o multithreading. The rest functionality remains unchanged. Fixes llvm#90923, fixes llvm#56346. This is the part 1 of the replacement of llvm#100670. In the part 2 I plan to switch `lldb-server gdbserver` to use `--fd` and listen a common gdb port for all gdbserver connections. Then we can remove gdb port mapping to fiх llvm#97537. (cherry picked from commit 82ee31f)
To reproduce, the remote needs to be something relatively slow. In my case QEMU.
Start an lldb-server on the remote in platform mode with some restricted ports:
Then connect lldb and run a program:
At this point lldb-server has started a new process to handle this client's connection and given it a port map that has one open port, that it has used to start a gdb-server process.
From here, if you finish the program then run it again, everything is fine. The gdbserver is torn down and the port is freed, then reused for the new gdbserver.
However, if you run before the program finishes there is a race between the platform killing the gdbserver process and the platform handling the launch gdb server request packet:
We can see that the packets are sent in the right order:
On the server side it uses
kill()
to kill the process and it should then free the port here:llvm-project/lldb/tools/lldb-server/lldb-platform.cpp
Line 300 in 76c84e7
(introduced by #88845, which I think made the feature overall better, but in doing so exposed this issue)
If the remote is slow enough that the kill doesn't actually finish before we get back there, then it won't and it'll try to find a free port, find none, and fail to launch the gdb server. Some ad-hoc logging shows this on the server side:
This does not happen debugging locally on real hardware.
I discovered this running some of the SVE tests again. They run the debugee once to discover the supported vector lengths and then again to run the actual test case. Adding a
sleep(5)
in the test cases between those 2 runs also "fixes" the issue.The workaround is to not use a port map at all, but often giving full network access to a VM is difficult. So I'd like to find a way to make this work, or at least work around it in the tests most likely to be run this way.
The text was updated successfully, but these errors were encountered: