Skip to content

Commit ba86dd6

Browse files
committed
[Security] Document StatelessProcessGroup security concerns
A recent PR, vllm-project#15988, improved StatelessProcessGroup to ensure the torch.distributed TCPStore uses the specified IP address unless of binding to all interfaces. Upon closer inspection, this is quite important, as the way vllm is using this TCPStore includes pickled data, so malicious access to the TCPStore would allow remote code execution on a vllm host. Update some places throughout the code base to reflect the importance of specifying a secured IP addres for use with this interface. Finally, fix a couple places in tests to explicitly use localhost instead of the IP we find that's (probably) the one used for the host's default route. Otherwise, a host running these tests is briefly vulnerable on the IP address chosen. Signed-off-by: Russell Bryant <[email protected]>
1 parent 8c946ce commit ba86dd6

File tree

5 files changed

+25
-8
lines changed

5 files changed

+25
-8
lines changed

examples/offline_inference/rlhf.py

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,14 @@
1111
inference instance. In practice, there could be multiple training instances
1212
and multiple inference instances. For the full implementation, please refer
1313
to the OpenRLHF framework.
14+
15+
It is important to set `VLLM_HOST_IP` to an address on a secure network when
16+
using this example. Unsecured communications between components will be used
17+
over this IP address and should NOT be exposed to untrusted networks. For more
18+
information, see:
19+
https://docs.vllm.ai/en/latest/deployment/security.html
1420
"""
21+
1522
import os
1623

1724
import ray

examples/offline_inference/rlhf_utils.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,8 +27,14 @@ class WorkerExtension:
2727
By defining an extension class, the code can work no matter what is
2828
the underlying worker class. This way, the code can be compatible
2929
with both vLLM V0 and V1.
30+
3031
NOTE: we define this class in a separate module, and the main module
3132
should pass the full qualified name as `worker_extension_cls` argument.
33+
34+
The `master_address` parameter should be an address on a secure network that
35+
is ideally completely isolated. Services used on this network are insecure
36+
and will make the system vulnerable to remote code execution if exposed to
37+
malicious parties.
3238
"""
3339

3440
def init_weight_update_group(self, master_address, master_port,

tests/distributed/test_same_node.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -6,20 +6,20 @@
66

77
from vllm.distributed.parallel_state import in_the_same_node_as
88
from vllm.distributed.utils import StatelessProcessGroup
9-
from vllm.utils import get_ip, get_open_port
9+
from vllm.utils import get_open_port
1010

1111
if __name__ == "__main__":
1212
dist.init_process_group(backend="gloo")
1313

1414
rank = dist.get_rank()
1515
if rank == 0:
1616
port = get_open_port()
17-
ip = get_ip()
17+
ip = "127.0.0.1"
1818
dist.broadcast_object_list([ip, port], src=0)
1919
else:
20-
recv = [None, None]
20+
recv = [None, None] # type: ignore
2121
dist.broadcast_object_list(recv, src=0)
22-
ip, port = recv
22+
ip, port = recv # type: ignore
2323

2424
stateless_pg = StatelessProcessGroup.create(ip, port, rank,
2525
dist.get_world_size())

tests/distributed/test_shm_broadcast.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99

1010
from vllm.distributed.device_communicators.shm_broadcast import MessageQueue
1111
from vllm.distributed.utils import StatelessProcessGroup
12-
from vllm.utils import get_ip, get_open_port, update_environment_variables
12+
from vllm.utils import get_open_port, update_environment_variables
1313

1414

1515
def get_arrays(n: int, seed: int = 0) -> list[np.ndarray]:
@@ -60,12 +60,12 @@ def worker_fn():
6060
rank = dist.get_rank()
6161
if rank == 0:
6262
port = get_open_port()
63-
ip = get_ip()
63+
ip = "127.0.0.1"
6464
dist.broadcast_object_list([ip, port], src=0)
6565
else:
66-
recv = [None, None]
66+
recv = [None, None] # type: ignore
6767
dist.broadcast_object_list(recv, src=0)
68-
ip, port = recv
68+
ip, port = recv # type: ignore
6969

7070
stateless_pg = StatelessProcessGroup.create(ip, port, rank,
7171
dist.get_world_size())

vllm/distributed/utils.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -238,6 +238,10 @@ def create(
238238
used for exchanging metadata. With this function, process A and process B
239239
can call `StatelessProcessGroup.create` to form a group, and then process A, B,
240240
C, and D can call `StatelessProcessGroup.create` to form another group.
241+
242+
The `host` parameter should be an address on a secure network that is ideally
243+
completely isolated. Services used on this network are insecure and will make
244+
the system vulnerable to remote code execution if exposed to malicious parties.
241245
""" # noqa
242246
launch_server = rank == 0
243247
if launch_server:

0 commit comments

Comments
 (0)