-
Notifications
You must be signed in to change notification settings - Fork 179
Add monkey patch #24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add monkey patch #24
Conversation
02dce4b
to
6fe137e
Compare
6fe137e
to
30b9edf
Compare
from vllm.platforms import current_platform | ||
device_comm_cls = resolve_obj_by_qualname( | ||
current_platform.get_device_communicator_cls()) | ||
self.communicator = device_comm_cls(group=self.device_group, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should check if use_xxx_communicator
(any is fine because they remain the same) and world_size > 1
is true before creating communicator.
https://github.com/vllm-project/vllm/blob/main/vllm/distributed/parallel_state.py#L167-L169
Besides model parallel group, there will be a world group, which won't use any device communication. Adding this check will reduce time when creating the world group.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added world_size check in the new Patch. There is no use_xxx_communicator
in vllm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean use_tpu_communicator
, use_xpu_communicator
or use_hpu_communicator
, any one of them is ok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are checked in supper.init, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example, the check of use_tpu_communicator
in supper.init
only work for tpu_communicator
, we use it here for npu communicator, because there is no bool value for npu to control this check.
I think we could just use use_tpu_communicator
as all the use_xxx_communicator
remains the same in vLLM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got your idea. Thanks. i'll update then
from vllm.platforms import current_platform | ||
device_comm_cls = resolve_obj_by_qualname( | ||
current_platform.get_device_communicator_cls()) | ||
self.communicator = device_comm_cls(group=self.device_group, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this still depends on the vllm-project/vllm CommunicatorBase
? Seems CommunicatorBase
should also move to vllm-ascend?
https://github.com/vllm-project/vllm-ascend/blob/main/vllm_ascend/communicator.py#L21
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed CommunicatorBase in the new patchset
# Remove this file when vllm support by | ||
# https://github.com/vllm-project/vllm/pull/11324. | ||
|
||
from vllm.distributed.parallel_state import GroupCoordinator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unrelated but just curious: should vllm be a dependency of vllm-ascend as oneline in requriement and pyproject?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
emm. Let's have a try. we can add it.
While IMO, it maybe raises error because there is no CPU version of pytorch on pypi.
Once it's added, the install step in the future from my sight is:
- install cpu version of Pytorch by hand. (torch==2.5.1+cpu)
- pip install vllm-ascend
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no warries, we can do it in followup
4da98ee
to
57f3aca
Compare
|
||
def all_reduce(self, x: torch.Tensor) -> torch.Tensor: | ||
dist.all_reduce(x, group=self.group) | ||
return x | ||
|
||
def gather(self, input_: torch.Tensor, dst: int = 0, dim: int = -1): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we have any UT to check the functionality?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
communicator test need more than one NPU card which is not supported by current CI. We're working on multi card support for CI system.
In this comment, we need test this PR by hand locally and be careful to merge it.
output_tensor = None | ||
return output_tensor | ||
|
||
def all_gather(self, input_: torch.Tensor, dim: int = -1) -> torch.Tensor: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
Do not merge until it's fully tested locally. Thanks. |
Lines 12 to 14 in 7006835
This should also be removed |
Signed-off-by: wangxiyuan <[email protected]>
57f3aca
to
07f2a16
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM if it passed in multi-card env
See #30 |
### What this PR does / why we need it? - Remove on communicator mypy to address: #24 (comment) - Add mypy.ini to trigger list ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI passed Signed-off-by: Yikun Jiang <[email protected]>
…m-project#45) ### What this PR does / why we need it? - Remove on communicator mypy to address: vllm-project#24 (comment) - Add mypy.ini to trigger list ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI passed Signed-off-by: Yikun Jiang <[email protected]>
Some PR for plugin support is not merged by vllm yet. This PR add monkey patch to vllm-ascend to make vllm-ascend work with vllm directly.
This patch code should be removed once the related function is supported by vllm originally.