[attn] fix device of tensors in attention #25

MengqingCao · 2025-02-10T07:51:52Z

What this PR does / why we need it?

Fix device of tensors created in AscendAttentionBackendImpl.

While specifying device to cards except card-0, there'll cause an device conflict because the tensors (such as attn_mask) will be put on card-0 by default.

This pr creates these tensors on the correct card corresponding to the input.

Does this PR introduce any user-facing change?

User could specify device with local rank by this pr, and a modify on vLLM is also needed, will related to this pr when created.

How was this patch tested?

This is tested by the following code locally. Will add a test case when the modify in vLLM is also completed.

from vllm import LLM, SamplingParams

prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]

# Create a sampling params object.
sampling_params = SamplingParams(max_tokens=100, temperature=0.0)
# Create an LLM.
llm = LLM(model="~/.cache/modelscope/hub/Qwen/Qwen2___5-7B-Instruct", device="npu:1")

# Generate texts from the prompts.
outputs = llm.generate(prompts, sampling_params)
for output in outputs:
    prompt = output.prompt
    generated_text = output.outputs[0].text
    print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")

Results

MengqingCao · 2025-02-10T07:55:59Z

cc @ji-huazhong

Signed-off-by: MengqingCao <[email protected]>

MengqingCao · 2025-02-10T09:36:23Z

related pr on vllm: vllm-project/vllm#13027

### What this PR does / why we need it? Fix device of tensors created in `AscendAttentionBackendImpl`. While specifying device to cards except card-0, there'll cause an **device conflict** because the tensors (such as `attn_mask`) will be put on card-0 by default. This pr creates these tensors on the correct card corresponding to the input. ### Does this PR introduce _any_ user-facing change? User could specify device with local rank by this pr, and a modify on vLLM is also needed, will related to this pr when created. ### How was this patch tested? This is tested by the following code locally. Will add a test case when the modify in vLLM is also completed. ```python from vllm import LLM, SamplingParams prompts = [ "Hello, my name is", "The president of the United States is", "The capital of France is", "The future of AI is", ] # Create a sampling params object. sampling_params = SamplingParams(max_tokens=100, temperature=0.0) # Create an LLM. llm = LLM(model="~/.cache/modelscope/hub/Qwen/Qwen2___5-7B-Instruct", device="npu:1") # Generate texts from the prompts. outputs = llm.generate(prompts, sampling_params) for output in outputs: prompt = output.prompt generated_text = output.outputs[0].text print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}") ``` Signed-off-by: MengqingCao <[email protected]>

ji-huazhong approved these changes Feb 10, 2025

View reviewed changes

[attn] fix device of tensors

be5269d

Signed-off-by: MengqingCao <[email protected]>

MengqingCao force-pushed the rank branch from cb5b704 to be5269d Compare February 10, 2025 08:49

wangxiyuan approved these changes Feb 10, 2025

View reviewed changes

wangxiyuan merged commit 7006835 into vllm-project:main Feb 10, 2025
8 checks passed

ji-huazhong mentioned this pull request Feb 13, 2025

The output tensor needs to be created on the same device as the query… #54

Merged

MengqingCao deleted the rank branch February 25, 2025 08:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[attn] fix device of tensors in attention #25

[attn] fix device of tensors in attention #25

Uh oh!

MengqingCao commented Feb 10, 2025

Uh oh!

MengqingCao commented Feb 10, 2025

Uh oh!

MengqingCao commented Feb 10, 2025

Uh oh!

Uh oh!

Uh oh!

[attn] fix device of tensors in attention #25

[attn] fix device of tensors in attention #25

Uh oh!

Conversation

MengqingCao commented Feb 10, 2025

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Results

Uh oh!

MengqingCao commented Feb 10, 2025

Uh oh!

MengqingCao commented Feb 10, 2025

Uh oh!

Uh oh!

Uh oh!