Add monkey patch #24

wangxiyuan · 2025-02-10T01:57:13Z

Some PR for plugin support is not merged by vllm yet. This PR add monkey patch to vllm-ascend to make vllm-ascend work with vllm directly.

This patch code should be removed once the related function is supported by vllm originally.

MengqingCao · 2025-02-10T02:40:22Z

vllm_ascend/patch/patch_commnicator.py

+        from vllm.platforms import current_platform
+        device_comm_cls = resolve_obj_by_qualname(
+            current_platform.get_device_communicator_cls())
+        self.communicator = device_comm_cls(group=self.device_group,


I think we should check if use_xxx_communicator (any is fine because they remain the same) and world_size > 1 is true before creating communicator.
https://github.com/vllm-project/vllm/blob/main/vllm/distributed/parallel_state.py#L167-L169

Besides model parallel group, there will be a world group, which won't use any device communication. Adding this check will reduce time when creating the world group.

I added world_size check in the new Patch. There is no use_xxx_communicator in vllm.

I mean use_tpu_communicator, use_xpu_communicator or use_hpu_communicator, any one of them is ok

They are checked in supper.init, right?

For example, the check of use_tpu_communicator in supper.init only work for tpu_communicator, we use it here for npu communicator, because there is no bool value for npu to control this check.
I think we could just use use_tpu_communicator as all the use_xxx_communicator remains the same in vLLM

I got your idea. Thanks. i'll update then

vllm_ascend/patch/patch_commnicator.py

Yikun · 2025-02-10T06:44:44Z

vllm_ascend/patch/patch_commnicator.py

+        from vllm.platforms import current_platform
+        device_comm_cls = resolve_obj_by_qualname(
+            current_platform.get_device_communicator_cls())
+        self.communicator = device_comm_cls(group=self.device_group,


does this still depends on the vllm-project/vllm CommunicatorBase? Seems CommunicatorBase should also move to vllm-ascend?

https://github.com/vllm-project/vllm-ascend/blob/main/vllm_ascend/communicator.py#L21

Removed CommunicatorBase in the new patchset

Yikun · 2025-02-10T06:48:13Z

vllm_ascend/patch/patch_commnicator.py

+# Remove this file when vllm support by
+# https://github.com/vllm-project/vllm/pull/11324.
+
+from vllm.distributed.parallel_state import GroupCoordinator


unrelated but just curious: should vllm be a dependency of vllm-ascend as oneline in requriement and pyproject?

emm. Let's have a try. we can add it.

While IMO, it maybe raises error because there is no CPU version of pytorch on pypi.

Once it's added, the install step in the future from my sight is:

install cpu version of Pytorch by hand. (torch==2.5.1+cpu)

pip install vllm-ascend

no warries, we can do it in followup

wuhuikx · 2025-02-10T08:11:20Z

vllm_ascend/communicator.py


    def all_reduce(self, x: torch.Tensor) -> torch.Tensor:
        dist.all_reduce(x, group=self.group)
        return x
+
+    def gather(self, input_: torch.Tensor, dst: int = 0, dim: int = -1):


do we have any UT to check the functionality?

communicator test need more than one NPU card which is not supported by current CI. We're working on multi card support for CI system.

In this comment, we need test this PR by hand locally and be careful to merge it.

wuhuikx · 2025-02-10T08:11:40Z

vllm_ascend/communicator.py

+            output_tensor = None
+        return output_tensor
+
+    def all_gather(self, input_: torch.Tensor, dim: int = -1) -> torch.Tensor:


wangxiyuan · 2025-02-10T08:33:01Z

Do not merge until it's fully tested locally. Thanks.

Yikun · 2025-02-10T11:22:40Z

vllm-ascend/mypy.ini

Lines 12 to 14 in 7006835

    
           ; Remove this after https://github.com/vllm-project/vllm/pull/11324 merged 
        
           [mypy-vllm.distributed.device_communicators.base_communicator] 
        
           ignore_missing_imports = True

This should also be removed

Signed-off-by: wangxiyuan <[email protected]>

Yikun

LGTM if it passed in multi-card env

wangxiyuan · 2025-02-11T02:57:29Z

See #30

### What this PR does / why we need it? - Remove on communicator mypy to address: #24 (comment) - Add mypy.ini to trigger list ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI passed Signed-off-by: Yikun Jiang <[email protected]>

…m-project#45) ### What this PR does / why we need it? - Remove on communicator mypy to address: vllm-project#24 (comment) - Add mypy.ini to trigger list ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI passed Signed-off-by: Yikun Jiang <[email protected]>

wangxiyuan force-pushed the add_patch branch from 02dce4b to 6fe137e Compare February 10, 2025 02:00

wangxiyuan changed the title ~~Add monckey patch~~ Add monkey patch Feb 10, 2025

wangxiyuan force-pushed the add_patch branch from 6fe137e to 30b9edf Compare February 10, 2025 02:11

MengqingCao reviewed Feb 10, 2025

View reviewed changes

Yikun reviewed Feb 10, 2025

View reviewed changes

wangxiyuan force-pushed the add_patch branch 2 times, most recently from 4da98ee to 57f3aca Compare February 10, 2025 07:50

wuhuikx reviewed Feb 10, 2025

View reviewed changes

wuhuikx approved these changes Feb 10, 2025

View reviewed changes

Add monckey patch

07f2a16

Signed-off-by: wangxiyuan <[email protected]>

wangxiyuan force-pushed the add_patch branch from 57f3aca to 07f2a16 Compare February 10, 2025 11:25

Yikun approved these changes Feb 11, 2025

View reviewed changes

wangxiyuan closed this Feb 11, 2025

wangxiyuan deleted the add_patch branch February 11, 2025 02:46

wangxiyuan restored the add_patch branch February 11, 2025 02:53

Yikun mentioned this pull request Feb 11, 2025

[FOLLOWUP][Misc] Remove unused mypy config for base_communicator #45

Merged

Add monkey patch #24

Add monkey patch #24

Uh oh!

Conversation

wangxiyuan commented Feb 10, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Yikun Feb 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wangxiyuan Feb 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wangxiyuan commented Feb 10, 2025

Uh oh!

Yikun commented Feb 10, 2025

Uh oh!

Yikun left a comment

Choose a reason for hiding this comment

Uh oh!

wangxiyuan commented Feb 11, 2025

Uh oh!

Uh oh!

Yikun Feb 10, 2025 •

edited

Loading

wangxiyuan Feb 10, 2025 •

edited

Loading