Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatic setup for AMD GPUs on Linux #6709

Merged
merged 10 commits into from
Jan 15, 2023

Conversation

daniandtheweb
Copy link
Contributor

@daniandtheweb daniandtheweb commented Jan 13, 2023

Describe what this pull request is trying to achieve.

This pull request automates the setup process for AMD users on linux.

Environment this was tested in

  • OS: Linux, Windows
  • Browser: Firefox
  • Graphics card: AMD RX 5700 XT

This commit adds a few lines to detect if the system has an AMD gpu and adds an environment variable needed for torch to recognize the gpu.
This commit adds a script that detects which GPU is currently used in Windows and Linux
This commit allows the launch script to automatically download rocm's torch version for AMD GPUs using an external GPU detection script. It also prints the operative system and GPU in use.
This fixes the script on macos
@AUTOMATIC1111
Copy link
Owner

    gpu_info=$(lspci | grep VGA)
    if echo "$gpu_info" | grep -q "AMD"

If those two lines are enough to check for AMD, why all the added code? You can set TORCH_COMMAND from the sh file.

@daniandtheweb
Copy link
Contributor Author

I tried to create an univeral way to detect the gpu on both windows and linux to prepare for a possible future compatibility with directml. I can make another commit to delete the detection code and install the rocm packages directly from the webui script to make everything smaller and just think about the detection part when it'll be needed.

@daniandtheweb
Copy link
Contributor Author

daniandtheweb commented Jan 14, 2023

The only manual thing that is needed to run this on an AMD system is to add the --precision full and --no-half arguments at least on the RX 5700XT but those can now be added to the webui-user script. After this change the wiki could be updated with the simplified setup process.

@daniandtheweb
Copy link
Contributor Author

daniandtheweb commented Jan 14, 2023

About the HSA_OVERRIDE_GFX_VERSION=10.3.0 I know that it's needed on the 5700XT but I'm not sure if that's the case for other cards as well and I have no way to check it. If it turns out to be harmless on the other AMD cards i guess that it would be ok to leave it.

@daniandtheweb
Copy link
Contributor Author

daniandtheweb commented Jan 14, 2023

I found this:

Because GFX1030 is the series model name of RDNA2 ( i.e. the series contains 6700xt, 6800xt and 6900xt), and since time of my last comment (only 6900xt of that series is official supported fro ROCM) HSA_OVERRIDE_GFX_VERSION=10.3.0 means you pretends to have a 6900xt, so that ROCM allow you to run the following python file. Further more, for rocm_tensorflow (at least when using opencv), pretend to be a RX580 sometimes usefully (HSA_OVERRIDE_GFX_VERSION=8.3.0), otherwise my 5700xt will get memory error!!!!)

So it seems to be safe to leave the variable for now.
If it's necessary I can add a check to just add it on a 5000 series card.

@AUTOMATIC1111
Copy link
Owner

TORCH_COMMAND should be set only if user didn't set it in user.sh script, similar to how LAUNCH_SCRIPT is set.

@AUTOMATIC1111 AUTOMATIC1111 merged commit ebfdd7b into AUTOMATIC1111:master Jan 15, 2023
@vt-idiot
Copy link
Contributor

I found this:

Because GFX1030 is the series model name of RDNA2 ( i.e. the series contains 6700xt, 6800xt and 6900xt), and since time of my last comment (only 6900xt of that series is official supported fro ROCM) HSA_OVERRIDE_GFX_VERSION=10.3.0 means you pretends to have a 6900xt, so that ROCM allow you to run the following python file. Further more, for rocm_tensorflow (at least when using opencv), pretend to be a RX580 sometimes usefully (HSA_OVERRIDE_GFX_VERSION=8.3.0), otherwise my 5700xt will get memory error!!!!)

So it seems to be safe to leave the variable for now. If it's necessary I can add a check to just add it on a 5000 series card.

It should have only been added for Navi/RDNA. Using HSA_OVERRIDE_GFX_VERSION=10.3.0 on a Polaris card (GFX830) messes things up IIRC.

@daniandtheweb
Copy link
Contributor Author

Do you know if the variable causes issues or is it necessary on other cards rather than the 5000 series? I can't find any more info about it. If it causes issues on any card rather than the 5000s i can add a check for it, I just don't know if the variable is needed for the 6000 series too.

@vt-idiot
Copy link
Contributor

I do have an RX 5700 I could test it out on, but I'd need, realistically, an afternoon and an ounce. Maybe a few bars. And more seriously, a 2nd drive to try dual-booting from.

This was my original frame of reference:
https://rentry.org/sd-nativeisekaitoo#assertion-failed-torch-is-not-able-to-use-gpu-or-hiperrornobinaryforgpu-unable-to-find-code-object-for-all-current-devices

ROCM_ENABLE_PRE_VEGA=1

mentioned for pre-Vega cards, so I guess that'd include my Polaris cards as well, and:

HSA_OVERRIDE_GFX_VERSION=10.3.0

mentioned for "newer" cards, whatever that means.

https://forum.garudalinux.org/t/trying-to-run-stable-diffusion-with-rx-590/22898/12 mentions that forcing HSA_OVERRIDE_GFX_VERSION=10.3.0 on Polaris causes a segfault. I've never read of someone using that successfully with a card older than Navi. and IIRC, you shouldn't force the variable on gfx9**/Vega cards either, since most of them still support ROCm directly with no workarounds. tricking ROCm into thinking its a gfx1030 card really should only be reserved for unsupported RDNA/RDNA2 cards

people using it to run on gfx1032, RX 6700S - seems to work for RX 5500 XT as well

openSUSE wiki, "force same generation"

sample list of supported architectures from rocBLAS:

gfx803 gfx900 gfx906:xnack- gfx908:xnack- gfx90a:xnack+ gfx90a:xnack- gfx1010 gfx1012 gfx1030 gfx1100 gfx1101 gfx1102 )

so forcing the 10.3.0 override is useful for e.g. gfx1031, gfx1032, which aren't officially supported, but don't seem to complain about running with the flag. gfx90X are the Vega based cards, gfx101X Navi, gfx803 is Polaris which they technically dropped but kinda sorta not really. GG AMD.

I'm not sure why you needed the override for your RX5700XT? gfx1010 is supposed to be supported

AMD Radeon RX 5700 XT Specs | TechPowerUp GPU Database

Graphics/Compute: GFX10.1 (gfx1010)

@daniandtheweb
Copy link
Contributor Author

I can try later to add a check to just include the variable on RDNA and RDNA2 cards then. I’ve also seen that my GPU should be supported natively but in practice the program without the variable just doesn’t detect the card.

@daniandtheweb
Copy link
Contributor Author

I just started a pull request that adds a check for the gpu: only if the card is part of the navi family it'll insert the environment variable now

@outget
Copy link

outget commented Mar 9, 2023

I do have an RX 5700 I could test it out on, but I'd need, realistically, an afternoon and an ounce. Maybe a few bars. And more seriously, a 2nd drive to try dual-booting from.

This was my original frame of reference: https://rentry.org/sd-nativeisekaitoo#assertion-failed-torch-is-not-able-to-use-gpu-or-hiperrornobinaryforgpu-unable-to-find-code-object-for-all-current-devices

ROCM_ENABLE_PRE_VEGA=1

mentioned for pre-Vega cards, so I guess that'd include my Polaris cards as well, and:

HSA_OVERRIDE_GFX_VERSION=10.3.0

mentioned for "newer" cards, whatever that means.

https://forum.garudalinux.org/t/trying-to-run-stable-diffusion-with-rx-590/22898/12 mentions that forcing HSA_OVERRIDE_GFX_VERSION=10.3.0 on Polaris causes a segfault. I've never read of someone using that successfully with a card older than Navi. and IIRC, you shouldn't force the variable on gfx9**/Vega cards either, since most of them still support ROCm directly with no workarounds. tricking ROCm into thinking its a gfx1030 card really should only be reserved for unsupported RDNA/RDNA2 cards

people using it to run on gfx1032, RX 6700S - seems to work for RX 5500 XT as well

openSUSE wiki, "force same generation"

sample list of supported architectures from rocBLAS:

gfx803 gfx900 gfx906:xnack- gfx908:xnack- gfx90a:xnack+ gfx90a:xnack- gfx1010 gfx1012 gfx1030 gfx1100 gfx1101 gfx1102 )

so forcing the 10.3.0 override is useful for e.g. gfx1031, gfx1032, which aren't officially supported, but don't seem to complain about running with the flag. gfx90X are the Vega based cards, gfx101X Navi, gfx803 is Polaris which they technically dropped but kinda sorta not really. GG AMD.

I'm not sure why you needed the override for your RX5700XT? gfx1010 is supposed to be supported

AMD Radeon RX 5700 XT Specs | TechPowerUp GPU Database

Graphics/Compute: GFX10.1 (gfx1010)

So would this mean that cards like RX 580 8gb (which is the one I own) would not be supported at all since they are gfx803? I've been trying for days to get this to work but I end up going in circles trying to start from scratch and the first thing to happen is I get the cuda error thing where I then add the "--skip-torch-cuda-test" flag only to then get many different core dumps while playing with the other flags that are supposed to help with amd gpus (no half and full precision).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants