Avoid mistakenly picking Gaudi/HPU if XPU is requested. #11018

janimo · 2024-12-09T13:11:30Z

In setup.py _is_hpu() will return true when /dev/accel/accel0 is present but that can happen with XPU devices also (Intel iGPU).

This change allows VLLM_TARGET_DEVICE="xpu" to override that and proceed with an XPU install.

Maybe a simpler is_hpu() that solely relies on VLLM_TARGET_DEVICE="hpu" would be cleaner but the current setup is probably needed for some existing setups.

github-actions · 2024-12-09T13:11:44Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

jikunshang · 2024-12-10T02:18:26Z

nice catch! I think you are using some latest Intel CPU with GPU and NPU, and /dev/accel/accel0 is actually a NPU device. when you want to install vllm-XPU, is_hpu method also returns True, right.
cc @kzawora-intel , any better approach to determine whether HPU (rather NPU or other accelerator device) exists on host?

janimo · 2024-12-13T16:15:12Z

@jikunshang updated the change to be simpler and more generic. On the same computer with that NPU present, openvino, xpu and cpu are all valid target names and are all preceded by the is_hpu() check in setup.py and thus cannot be selected.

jikunshang · 2024-12-15T09:42:17Z

@jikunshang updated the change to be simpler and more generic. On the same computer with that NPU present, openvino, xpu and cpu are all valid target names and are all preceded by the is_hpu() check in setup.py and thus cannot be selected.

I feel the best fix is making is_hpu_available correct. Your fix works abolutely, but user need add extra VLLM_TARGET_DEVICE=hpu on hpu device with this fix. Any comments @kzawora-intel

janimo · 2024-12-15T09:51:40Z

@jikunshang you're right because VLLM_TARGET_DEVICE is always set, defaulting to cuda :(

I think explicitly requiring setting it for HPU as it is done with most other targets would be best. If there are setups where vllm on gaudi is being tested relying only on hw detection, they would need to be updated and that may be too complicated (if CI/production systems not keeping up closely with changes in vllm)

Other ways of fixing this:

have target be empty if not set, but this will probably break many more setups that just assume cuda.
return false if target is cpu/xpu/openvino which are situations where hpu can be misdetected. This would be a good localized fixed probably.
put the checks for is_hpu at the end, after cpu/xpu/opnevino in the two places in setup.py where such tests are run, with a comment explaining that order is important.

I still find the cleanest would be every target being selected explicitly. I feel the heuristics for the hw check may get even more complex with different types of hw and naming of devices in the future unless there is a single good indicator of Gaudi hw being present (like a line in dmesg)

kzawora-intel · 2025-01-14T15:14:42Z

Hi, I opened a PR #12046 fixing this issue - it definitely is a bug in _is_hpu that it returns True for non-HPU platforms, it should not do that. To prevent any future misdetections, I skipped autodetection if different platform is requested explicitly with VLLM_TARGET_DEVICE. and if autodetection continues, it no longer uses conditions that could pass on different platforms.

mergify bot added the ci/build label Dec 9, 2024

janimo force-pushed the hpu-check-fix branch from 5f0e5c3 to 5c4cf90 Compare December 9, 2024 13:26

Avoid mistakenly picking Gaudi/HPU if XPU is requested.

9f23dbf

janimo force-pushed the hpu-check-fix branch from 5c4cf90 to 9f23dbf Compare December 13, 2024 16:07

kzawora-intel mentioned this pull request Jan 14, 2025

[HPU][Bugfix] Don't use /dev/accel/accel0 for HPU autodetection in setup.py #12046

Merged

DarkLight1337 closed this in #12046 Jan 15, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Avoid mistakenly picking Gaudi/HPU if XPU is requested. #11018

Avoid mistakenly picking Gaudi/HPU if XPU is requested. #11018

Uh oh!

janimo commented Dec 9, 2024 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Dec 9, 2024

Uh oh!

jikunshang commented Dec 10, 2024

Uh oh!

janimo commented Dec 13, 2024

Uh oh!

jikunshang commented Dec 15, 2024

Uh oh!

janimo commented Dec 15, 2024 •

edited

Loading

Uh oh!

kzawora-intel commented Jan 14, 2025

Uh oh!

Uh oh!

Uh oh!

Avoid mistakenly picking Gaudi/HPU if XPU is requested. #11018

Avoid mistakenly picking Gaudi/HPU if XPU is requested. #11018

Uh oh!

Conversation

janimo commented Dec 9, 2024 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Dec 9, 2024

Uh oh!

jikunshang commented Dec 10, 2024

Uh oh!

janimo commented Dec 13, 2024

Uh oh!

jikunshang commented Dec 15, 2024

Uh oh!

janimo commented Dec 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kzawora-intel commented Jan 14, 2025

Uh oh!

Uh oh!

janimo commented Dec 9, 2024 •

edited by github-actions bot

Loading

janimo commented Dec 15, 2024 •

edited

Loading