Support for Apple silicon #252

rickardp · 2023-04-01T22:16:37Z

Would it make sense for this library to support platforms other than cuda on x64 Linux? I am specifically looking for Apple silicon support. Currently not even cpuonly works since it assumes SSE2 support (Even without Neon. Support).

i would guess that the first step would be a full cross platform compile (arm64), then ideally support for Metal Performance Shaders as an alternative to CUDA (assuming it is at all feasible).

I could probably contribute some towards support if there is interest for bitsandbytes to be multi platform. I have some experience setting up cross platform Python libraries.

TheStoneMX · 2023-04-02T23:31:08Z

Hi there, I will contribute too, in order to get it to work on Metal Apple M1

this is my trace:

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
CUDA SETUP: Required library version not found: libsbitsandbytes_cpu.so. Maybe you need to compile it from source?
CUDA SETUP: Defaulting to libbitsandbytes_cpu.so...
dlopen([/Users/raziel/miniconda3/envs/nlp/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cpu.so](https://file+.vscode-resource.vscode-cdn.net/Users/raziel/miniconda3/envs/nlp/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cpu.so), 0x0006): tried: '[/Users/raziel/miniconda3/envs/nlp/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cpu.so](https://file+.vscode-resource.vscode-cdn.net/Users/raziel/miniconda3/envs/nlp/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cpu.so)' (not a mach-o file), '[/System/Volumes/Preboot/Cryptexes/OS/Users/raziel/miniconda3/envs/nlp/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cpu.so](https://file+.vscode-resource.vscode-cdn.net/System/Volumes/Preboot/Cryptexes/OS/Users/raziel/miniconda3/envs/nlp/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cpu.so)' (no such file), '[/Users/raziel/miniconda3/envs/nlp/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cpu.so](https://file+.vscode-resource.vscode-cdn.net/Users/raziel/miniconda3/envs/nlp/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cpu.so)' (not a mach-o file)
CUDA SETUP: Required library version not found: libsbitsandbytes_cpu.so. Maybe you need to compile it from source?
CUDA SETUP: Defaulting to libbitsandbytes_cpu.so...
dlopen([/Users/raziel/miniconda3/envs/nlp/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cpu.so](https://file+.vscode-resource.vscode-cdn.net/Users/raziel/miniconda3/envs/nlp/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cpu.so), 0x0006): tried: '[/Users/raziel/miniconda3/envs/nlp/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cpu.so](https://file+.vscode-resource.vscode-cdn.net/Users/raziel/miniconda3/envs/nlp/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cpu.so)' (not a mach-o file), '[/System/Volumes/Preboot/Cryptexes/OS/Users/raziel/miniconda3/envs/nlp/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cpu.so](https://file+.vscode-resource.vscode-cdn.net/System/Volumes/Preboot/Cryptexes/OS/Users/raziel/miniconda3/envs/nlp/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cpu.so)' (no such file), '[/Users/raziel/miniconda3/envs/nlp/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cpu.so](https://file+.vscode-resource.vscode-cdn.net/Users/raziel/miniconda3/envs/nlp/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cpu.so)' (not a mach-o file)
[/Users/raziel/miniconda3/envs/nlp/lib/python3.9/site-packages/bitsandbytes/cextension.py:31](https://file+.vscode-resource.vscode-cdn.net/Users/raziel/miniconda3/envs/nlp/lib/python3.9/site-packages/bitsandbytes/cextension.py:31): UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers and GPU quantization are unavailable.
  warn("The installed version of bitsandbytes was compiled without GPU support. "
--------------------------------------------------------------------------------------------
# What version of Python do you have?
import sys
import platform
import torch

has_gpu = torch.cuda.is_available()
has_mps = getattr(torch,'has_mps',False)
print('has_mps', has_mps)
device = "mps" if getattr(torch,'has_mps',False) \
    else "gpu" if torch.cuda.is_available() else "cpu"

print(f"Python Platform: {platform.platform()}")
print(f"PyTorch Version: {torch.__version__}")
print()
print(f"Python {sys.version}")
print("GPU is", "available" if has_gpu else "NOT AVAILABLE")
print("MPS (Apple Metal) is", "AVAILABLE" if has_mps else "NOT AVAILABLE")
print(f"Target device is {device}")
----------------------------------------------------------------------------------
has_mps True
Python Platform: macOS-13.3-arm64-arm-64bit
PyTorch Version: 2.0.0

Python 3.9.16 | packaged by conda-forge | (main, Feb  1 2023, 21:38:11) 
[Clang 14.0.6 ]
GPU is NOT AVAILABLE
MPS (Apple Metal) is AVAILABLE
Target device is mps

rickardp · 2023-04-03T21:45:01Z

Nice to hear! It would be good to hear from the maintainers that they are at all interested in making this package cross-platform. It is very much CUDA focused at the moment.

Getting libbitsandbytes_cpu.so to compile for macOS arm64 was not at all difficult, just an exercise in moving around some #ifdefs, but CPU support would obviously need to add Neon (SIMD) to make any sense IMHO. Then, of course the MPS support would be needed at one point (though I expect it's quite a lot more work).

I've just started looking at the unit tests and the Python libraries.

The C++ code is quite nicely structured, but the Python code would need some refactoring since most of the calls assume CUDA (x.cuda() instead of x.to(device), etc). Also, since the CPU version does not cover 100% of the feature set, testing is going to be quite some work as there is no real baseline. I suppose one question is if it would make sense to make the CPU cover 100% of the API calls, even if inefficient, just to provide a baseline that the GPU implementations could compare against?

If pursuing this, I propose implementing cross-platform CPU support first, then tackling MPS. MPS is of course what makes it useful.

(I have the exact same setup BTW, 2021 MBP)

Edit: Specifically, here's how I imagine the unit tests would have to work
https://github.com/TimDettmers/bitsandbytes/pull/257/files#diff-659bad232c71219167252c1a5ccbc427b6f54925b78741df18613c3c49aaa4c1R153

So at least one CPU test pass on my M1 Mac :)

janrinze · 2023-04-05T13:08:09Z

please have a look at Building on Jetson AGX Xavier Development Kit fails #221
It addresses the same AArch64 issue but on CUDA supported platforms like NVidia Jetson.

UserHIJ · 2023-06-24T17:46:10Z

Wow .. not to be inflammatory , but are we saying that there's no immediate solution for this if you have any macbook in the last like .. 5 years? Yuck.

janrinze · 2023-06-25T23:22:56Z

https://en.wikipedia.org/wiki/Apple_M1 introduced less than 3 years ago.
Things take time in the world of open-source. Specially when using hardware such as Apple.

KotlinFactory · 2023-08-06T08:03:34Z

when will this be done?

benjaminhuo · 2023-08-21T08:14:04Z

Would it make sense for this library to support platforms other than cuda on x64 Linux? I am specifically looking for Apple silicon support. Currently not even cpuonly works since it assumes SSE2 support (Even without Neon. Support).

i would guess that the first step would be a full cross platform compile (arm64), then ideally support for Metal Performance Shaders as an alternative to CUDA (assuming it is at all feasible).

I could probably contribute some towards support if there is interest for bitsandbytes to be multi platform. I have some experience setting up cross platform Python libraries.

Looking forward to the support for this too, got the below errors when I tried to fine-tune llama2 7B with load_in_8bit=True enabled on my Macbook M2, PyTorch‘s support to MPS is getting better and I hope this project could support this as well:

  File "/Users/ben/opt/miniconda3/envs/finetune/lib/python3.10/site-packages/torch/autograd/function.py", line 506, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/Users/ben/opt/miniconda3/envs/finetune/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py", line 293, in forward
    using_igemmlt = supports_igemmlt(A.device) and not state.force_no_igemmlt
  File "/Users/ben/opt/miniconda3/envs/finetune/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py", line 226, in supports_igemmlt
    if torch.cuda.get_device_capability(device=device) < (7, 5):
  File "/Users/ben/opt/miniconda3/envs/finetune/lib/python3.10/site-packages/torch/cuda/__init__.py", line 381, in get_device_capability
    prop = get_device_properties(device)
  File "/Users/ben/opt/miniconda3/envs/finetune/lib/python3.10/site-packages/torch/cuda/__init__.py", line 395, in get_device_properties
    _lazy_init()  # will define _get_device_properties
  File "/Users/ben/opt/miniconda3/envs/finetune/lib/python3.10/site-packages/torch/cuda/__init__.py", line 239, in _lazy_init
    raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled

AlexandreCassagne · 2023-09-08T13:01:17Z

@benjaminhuo Getting the same issue as you.

id4thomas · 2023-09-17T04:30:03Z

  File "/Users/ben/opt/miniconda3/envs/finetune/lib/python3.10/site-packages/torch/autograd/function.py", line 506, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/Users/ben/opt/miniconda3/envs/finetune/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py", line 293, in forward
    using_igemmlt = supports_igemmlt(A.device) and not state.force_no_igemmlt
  File "/Users/ben/opt/miniconda3/envs/finetune/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py", line 226, in supports_igemmlt
    if torch.cuda.get_device_capability(device=device) < (7, 5):
  File "/Users/ben/opt/miniconda3/envs/finetune/lib/python3.10/site-packages/torch/cuda/__init__.py", line 381, in get_device_capability
    prop = get_device_properties(device)
  File "/Users/ben/opt/miniconda3/envs/finetune/lib/python3.10/site-packages/torch/cuda/__init__.py", line 395, in get_device_properties
    _lazy_init()  # will define _get_device_properties
  File "/Users/ben/opt/miniconda3/envs/finetune/lib/python3.10/site-packages/torch/cuda/__init__.py", line 239, in _lazy_init
    raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled

https://github.com/TimDettmers/bitsandbytes/blob/18e827d666fa2b70a12d539ccedc17aa51b2c97c/bitsandbytes/autograd/_functions.py#L227

This seems to be due to calling torch.cuda even if the device type isn't cuda.
One way to patch these unchecked torch.cuda calls is adding device checks like

if device.type != 'cuda':
    return False

mps returns "mps" as device.type

pechaut78 · 2023-11-27T16:51:20Z

same issue here, MPS seems to be the problem

ProjectProgramAMark · 2023-12-02T16:02:46Z

getting same issue with apple silicon. would love to see some support for it soon!

ivan-digital · 2023-12-24T09:08:46Z

Same issue. Would be nice to have support for MPS.

ageorgios · 2023-12-26T21:10:41Z

Same here, please have support for MPS
https://github.com/ml-explore/mlx

592319702 · 2024-01-23T06:49:02Z

(torch-gpu) I542464@DY4GPKX1J0 test % python3 fine_tune_llama_2_in_google_colab.py /Users/I542464/miniconda3/envs/torch-gpu/lib/python3.11/site-packages/bitsandbytes/cextension.py:34: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable. warn("The installed version of bitsandbytes was compiled without GPU support. " 'NoneType' object has no attribute 'cadam32bit_grad_fp32' Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:32<00:00, 16.06s/it] /Users/I542464/miniconda3/envs/torch-gpu/lib/python3.11/site-packages/peft/utils/other.py:102: FutureWarning: prepare_model_for_int8_training is deprecated and will be removed in a future version. Use prepare_model_for_kbit_training instead. warnings.warn( /Users/I542464/miniconda3/envs/torch-gpu/lib/python3.11/site-packages/trl/trainer/sft_trainer.py:159: UserWarning: You didn't pass a max_seq_lengthargument to the SFTTrainer, this will default to 1024 warnings.warn( 0%| | 0/250 [00:00<?, ?it/s]You're using a LlamaTokenizerFast tokenizer. Please note that with a fast tokenizer, using thecallmethod is faster than using a method to encode the text followed by a call to thepadmethod to get a padded encoding.use_cache=Trueis incompatible with gradient checkpointing. Settinguse_cache=False... /Users/I542464/miniconda3/envs/torch-gpu/lib/python3.11/site-packages/torch/utils/checkpoint.py:429: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants. warnings.warn( FP4 quantization state not initialized. Please call .cuda() or .to(device) on the LinearFP4 layer first. Traceback (most recent call last): File "/Users/I542464/test/fine_tune_llama_2_in_google_colab.py", line 229, in <module> trainer.train() File "/Users/I542464/miniconda3/envs/torch-gpu/lib/python3.11/site-packages/transformers/trainer.py", line 1539, in train return inner_training_loop( ^^^^^^^^^^^^^^^^^^^^ File "/Users/I542464/miniconda3/envs/torch-gpu/lib/python3.11/site-packages/transformers/trainer.py", line 1809, in _inner_training_loop tr_loss_step = self.training_step(model, inputs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/I542464/miniconda3/envs/torch-gpu/lib/python3.11/site-packages/transformers/trainer.py", line 2654, in training_step loss = self.compute_loss(model, inputs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/I542464/miniconda3/envs/torch-gpu/lib/python3.11/site-packages/transformers/trainer.py", line 2679, in compute_loss outputs = model(**inputs) ^^^^^^^^^^^^^^^ File "/Users/I542464/miniconda3/envs/torch-gpu/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/I542464/miniconda3/envs/torch-gpu/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/I542464/miniconda3/envs/torch-gpu/lib/python3.11/site-packages/peft/peft_model.py", line 922, in forward return self.base_model( ^^^^^^^^^^^^^^^^ File "/Users/I542464/miniconda3/envs/torch-gpu/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/I542464/miniconda3/envs/torch-gpu/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/I542464/miniconda3/envs/torch-gpu/lib/python3.11/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/I542464/miniconda3/envs/torch-gpu/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py", line 806, in forward outputs = self.model( ^^^^^^^^^^^ File "/Users/I542464/miniconda3/envs/torch-gpu/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/I542464/miniconda3/envs/torch-gpu/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/I542464/miniconda3/envs/torch-gpu/lib/python3.11/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/I542464/miniconda3/envs/torch-gpu/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py", line 685, in forward layer_outputs = torch.utils.checkpoint.checkpoint( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/I542464/miniconda3/envs/torch-gpu/lib/python3.11/site-packages/torch/_compile.py", line 24, in inner return torch._dynamo.disable(fn, recursive)(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/I542464/miniconda3/envs/torch-gpu/lib/python3.11/site-packages/torch/_dynamo/eval_frame.py", line 328, in _fn return fn(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^ File "/Users/I542464/miniconda3/envs/torch-gpu/lib/python3.11/site-packages/torch/_dynamo/external_utils.py", line 17, in inner return fn(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^ File "/Users/I542464/miniconda3/envs/torch-gpu/lib/python3.11/site-packages/torch/utils/checkpoint.py", line 451, in checkpoint return CheckpointFunction.apply(function, preserve, *args) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/I542464/miniconda3/envs/torch-gpu/lib/python3.11/site-packages/torch/autograd/function.py", line 539, in apply return super().apply(*args, **kwargs) # type: ignore[misc] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/I542464/miniconda3/envs/torch-gpu/lib/python3.11/site-packages/torch/utils/checkpoint.py", line 230, in forward outputs = run_function(*args) ^^^^^^^^^^^^^^^^^^^ File "/Users/I542464/miniconda3/envs/torch-gpu/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py", line 681, in custom_forward return module(*inputs, output_attentions, None) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/I542464/miniconda3/envs/torch-gpu/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/I542464/miniconda3/envs/torch-gpu/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/I542464/miniconda3/envs/torch-gpu/lib/python3.11/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/I542464/miniconda3/envs/torch-gpu/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py", line 408, in forward hidden_states, self_attn_weights, present_key_value = self.self_attn( ^^^^^^^^^^^^^^^ File "/Users/I542464/miniconda3/envs/torch-gpu/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/I542464/miniconda3/envs/torch-gpu/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/I542464/miniconda3/envs/torch-gpu/lib/python3.11/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/I542464/miniconda3/envs/torch-gpu/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py", line 305, in forward query_states = self.q_proj(hidden_states) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/I542464/miniconda3/envs/torch-gpu/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/I542464/miniconda3/envs/torch-gpu/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/I542464/miniconda3/envs/torch-gpu/lib/python3.11/site-packages/peft/tuners/lora.py", line 1123, in forward result = super().forward(x) ^^^^^^^^^^^^^^^^^^ File "/Users/I542464/miniconda3/envs/torch-gpu/lib/python3.11/site-packages/bitsandbytes/nn/modules.py", line 221, in forward out = bnb.matmul_4bit(x, self.weight.t(), bias=bias, quant_state=self.weight.quant_state) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/I542464/miniconda3/envs/torch-gpu/lib/python3.11/site-packages/bitsandbytes/autograd/_functions.py", line 567, in matmul_4bit assert quant_state is not None ^^^^^^^^^^^^^^^^^^^^^^^ AssertionError 0%| | 0/250 [00:01<?, ?it/s]

mbtre · 2024-02-15T07:35:00Z

+1 MPS support would be absolutely great!

morkapronczay · 2024-02-27T13:28:36Z

adding a comment to keep this alive. MPS support would be awesome!

rickardp · 2024-02-27T15:36:13Z

Once the device abstraction has been been merged, we can start adding MPS-accelerated versions of the functions

Satyam7166-tech · 2024-02-27T15:54:16Z

Once the device abstraction has been been merged, we can start adding MPS-accelerated versions of the functions

Yay. Thanks to all your efforts.
One a side note: how can someone be skilled enough to contribute to this stuff? Like what topics should they cover?

sislam-provenir · 2024-03-01T16:36:28Z

Looking forward to MPS support!

anilkul98 · 2024-03-10T21:50:56Z

Looking forward to MPS Support!!!!

JohnSilverman · 2024-03-13T05:03:46Z

looking forward to mps support

anelook · 2024-11-16T19:09:42Z

+1

sojkin · 2024-11-16T22:37:45Z

+1

liygzting · 2024-11-23T14:56:52Z

Make MPS support! please

Limeslices · 2024-11-28T02:20:20Z

+1

rholdorf · 2024-12-02T21:15:49Z

+1

Phuket2 · 2024-12-17T18:47:53Z

+1

ITHwang · 2024-12-18T15:54:51Z

+1

SebasCrucer · 2024-12-19T00:46:31Z

MPS support! Don't let it die

rajneesh-git · 2024-12-28T21:50:04Z

Looking forward for MPS support!

benkap · 2025-01-03T21:34:38Z

+1

lu4p · 2025-01-06T21:56:50Z

I don't know if it helps, but here's https://github.com/filipstrand/mflux with support for quantized flux on mlx.

ZerocoolZa · 2025-01-09T21:27:15Z

Here’s the updated inline text with a detailed comparison table, including Python version, required packages, pros, and cons:

Lesson: Optimizing Llama Model on Apple M1 Mac with Quantization

Introduction

For those working with Llama models on Apple M1 Macs, using bitsandbytes for 8-bit quantization can present compatibility issues due to limited GPU support. Instead, leveraging PyTorch’s native MPS (Metal Performance Shaders) backend for quantization provides a robust solution that works seamlessly.

Prerequisites
• Python Version: 3.10.x
• Packages Installed:

pip install torch torchvision numpy

•	Model path to Llama model checkpoint

Step-by-Step Guide

Install Required Packages

Ensure the necessary libraries are installed:

pip install torch torchvision numpy

Load Your Model

Load your pre-trained Llama model using PyTorch:

import torch

Load the pre-trained Llama model

model_path = '/path/to/llama_model.pt'
model = torch.load(model_path)

Move the model to MPS device

device = torch.device('mps')
model.to(device)

Quantize the Model

Use PyTorch’s dynamic quantization feature for MPS:

from torch.quantization import quantize_dynamic

Quantize the model

quantized_model = quantize_dynamic(model, {torch.nn.Linear}, dtype=torch.qint8)

Save the quantized model

quantized_model_path = '/path/to/quantized_llama_model.pt'
torch.save(quantized_model, quantized_model_path)

print(f"Quantized Model Saved at: {quantized_model_path}")

Inference with Quantized Model

Perform inference using the quantized model:

input_tensor = torch.randn(1, *model.input_size, device=device)
output = quantized_model(input_tensor)

print(f"Inference Output: {output}")

Key Benefits of This Approach
1. No Need for bitsandbytes: By utilizing native PyTorch functionality, there’s no dependency on external libraries like bitsandbytes.
2. Optimized for MPS: PyTorch’s dynamic quantization handles 8-bit quantization efficiently, leveraging Metal Performance Shaders for GPU-like acceleration.
3. Simple and Seamless: This method integrates smoothly with your existing PyTorch workflow without requiring additional configuration or troubleshooting.
4. Cross-Platform Flexibility: With MPS support, your model is optimized for Apple M1, ensuring compatibility and performance.
Comparison Table

Feature	bitsandbytes	PyTorch MPS Quantization
Python Version	3.x	3.10.x
Required Packages	bitsandbytes, numpy, scipy	torch, torchvision, numpy
Backend	GPU (CUDA)	MPS (Metal Performance Shaders)
Dependency	External Library	Native PyTorch
Compatibility	Limited GPU support	Full MPS backend support
Optimization	Focused on image models	Flexible for both image & text
Performance	Dependent on GPU	Leverages Metal Performance Shaders
Seamlessness	Requires additional setup	Integrates seamlessly
Cross-Platform	Limited	Fully cross-platform
Pros	High-performance GPU quantization	Easy to use, seamless integration, optimized for M1 Mac
Cons	Complex setup, external dependency	Limited quantization options compared to bitsandbytes
Conclusion

This guide provides a practical solution for optimizing Llama models on Apple M1 Macs using native PyTorch MPS quantization. By avoiding external libraries like bitsandbytes, the process remains efficient, reliable, and straightforward for machine learning practitioners.

chigkim · 2025-02-04T20:23:59Z

I'd really appreciate MPS support please!

attentionmech · 2025-03-08T10:40:56Z

mps support please!

foggyghost0 · 2025-04-05T20:17:52Z

mps support please! how many more years?

giadefa · 2025-04-07T07:40:27Z

Having the capability to install it and run it would allow me to debug the code locally.
It does not have to be super optimized at first.

reneleonhardt · 2025-04-08T13:22:03Z

Prototype contribution #947 from Jan 2, 2024 was closed despite all that hard work... because it was not needed? 🤔

That seems to be the reasoning: #257 (comment)

We will try to add Apple silicon support soon along with AMD and Intel support. We need to figure out next steps first though. It will likely take some weeks, but we will get there. This is one of the highest-priority features that we have and we start working on this soon.

Many contributors offered to help with Apple Silicon since then #1340.
What happened, are the efforts being coordinated?

And if there already is a native way in PyTorch #252 (comment) can't that be integrated into bitsandbytes, at least as a fallback until a new implementation is ready?

rickardp · 2025-04-08T15:02:03Z

Prototype contribution #947 from Jan 2, 2024 was closed despite all that hard work... because it was not needed? 🤔

Most of this was actually merged in separate PRs to make it more manageable. This PR was mainly about making the library portable at all.

Before contributing, I think the architectural direction for this library has to be established. I was at the time arguing that we need a 100% test covered CPU implementation to start from so MPS support can be added gradually. If my understanding is correct, the latest architectural direction is to unify backend initialization with PyTorch. Once we get to a level where kernels can be added one by one and there are unit tests to verify correctness , I think a lot more can be done by the community. Right now IMHO there is quite a dependence on the core maintainers

matthewdouglas · 2025-04-08T17:33:47Z

Hi folks,

We're not quite there yet, but after merging #1544 we've started to pave the path forward. I want to make it clear that we haven't abandoned Apple silicon support, but instead have had competing priorities to shift through. We're still working toward this though.

In fact, we've now got a CI test suite that is stable and can be reliably deterministic on our main branch for the CUDA implementation. The CUDA implementation is always going to be the "gold standard" reference implementation, but this is a step forward toward being able to validate implementations for new hardware.

We have some new PyTorch native fallback implementations of some custom operators which can be used for reference/fallback on CPU as well as other devices. To @rickardp's point, this will allow incremental support for more optimal implementations on additional platforms, as well as serve as a secondary reference to evaluate against.

I have a branch which is a WIP that I have not pushed yet, but it has an implementation of the NF4 quantization/dequantization ops which can run on CPU and even on MPS with a little bit of work. I've also acquired an M4 MacBook Pro for future development/validation. Soon we should be able to build out enough plumbing to better enable community contributions for kernel implementations.

rickardp mentioned this issue Apr 3, 2023

Make bitsandbytes portable (with express goal of supporting Apple Silicon/ARM64) #257

Closed

9 tasks

janrinze mentioned this issue Apr 5, 2023

Building on Jetson AGX Xavier Development Kit fails #221

Closed

cornpo mentioned this issue May 9, 2023

can not get UI running - "The installed version of bitsandbytes was compiled without GPU support" oobabooga/text-generation-webui#1931

Open

1 task

rickardp mentioned this issue May 9, 2023

Please clarify scope of this repository #376

Closed

deep-pipeline mentioned this issue Jun 7, 2023

M1.M2 MacOS Users #485

Closed

neoneye mentioned this issue Sep 23, 2023

Apple M1 - autotrain setup warning - The installed version of bitsandbytes was compiled without GPU support. huggingface/autotrain-advanced#278

Closed

rickardp mentioned this issue Dec 30, 2023

[RFC] cross-platform: Refactoring bitsandbytes/cuda_setup #918

Closed

Titus-von-Koeller mentioned this issue Feb 4, 2024

[RFC] Cross-Platform Refactor: Overview + Link Hub #997

Closed

viantguest mentioned this issue Feb 22, 2024

GPU is needed for quantization in M2 MacOS huggingface/transformers#23970

Closed

4 tasks

calvdee mentioned this issue Jan 7, 2025

Exploding grad SFT on Mac m3 pro huggingface/smol-course#139

Open

matthewdouglas added the macOS label Jan 22, 2025

matthewdouglas mentioned this issue Jan 22, 2025

Any plan to expand device support for bitsandbytes? #961

Closed

matthewdouglas added the Duplicate This issue or pull request already exists label Feb 28, 2025

matthewdouglas marked this as a duplicate of #1460 Mar 3, 2025

matthewdouglas marked this as a duplicate of #1406 Mar 3, 2025

matthewdouglas marked this as a duplicate of #485 Mar 3, 2025

psychedelicious mentioned this issue Apr 7, 2025

[bug]: BNB not supported on apple silicon, don't download the t5xxxl bnb even for quantized FLUX invoke-ai/InvokeAI#7886

Open

1 task

rgeorgi mentioned this issue Apr 12, 2025

Apple Silicone Support (This time not on Master) bmaltais/kohya_ss#3174

Merged

davidxia mentioned this issue Apr 30, 2025

[Doc]: update contributing guide for macOS Apple silicon vllm-project/vllm#16940

Open

1 task

Support for Apple silicon #252

Support for Apple silicon #252

Comments

rickardp commented Apr 1, 2023 • edited Loading

TheStoneMX commented Apr 2, 2023 • edited Loading

rickardp commented Apr 3, 2023 • edited Loading

janrinze commented Apr 5, 2023

UserHIJ commented Jun 24, 2023

janrinze commented Jun 25, 2023

KotlinFactory commented Aug 6, 2023

benjaminhuo commented Aug 21, 2023 • edited Loading

AlexandreCassagne commented Sep 8, 2023

id4thomas commented Sep 17, 2023

pechaut78 commented Nov 27, 2023

ProjectProgramAMark commented Dec 2, 2023

ivan-digital commented Dec 24, 2023

ageorgios commented Dec 26, 2023

592319702 commented Jan 23, 2024

mbtre commented Feb 15, 2024

morkapronczay commented Feb 27, 2024

rickardp commented Feb 27, 2024

Satyam7166-tech commented Feb 27, 2024

sislam-provenir commented Mar 1, 2024

anilkul98 commented Mar 10, 2024

JohnSilverman commented Mar 13, 2024

anelook commented Nov 16, 2024

sojkin commented Nov 16, 2024

liygzting commented Nov 23, 2024

Limeslices commented Nov 28, 2024

rholdorf commented Dec 2, 2024

Phuket2 commented Dec 17, 2024

ITHwang commented Dec 18, 2024

SebasCrucer commented Dec 19, 2024

rajneesh-git commented Dec 28, 2024 • edited Loading

benkap commented Jan 3, 2025

lu4p commented Jan 6, 2025

ZerocoolZa commented Jan 9, 2025

Load the pre-trained Llama model

Move the model to MPS device

Quantize the model

Save the quantized model

chigkim commented Feb 4, 2025

attentionmech commented Mar 8, 2025

foggyghost0 commented Apr 5, 2025

giadefa commented Apr 7, 2025

reneleonhardt commented Apr 8, 2025 • edited Loading

rickardp commented Apr 8, 2025

matthewdouglas commented Apr 8, 2025

rickardp commented Apr 1, 2023 •

edited

Loading

TheStoneMX commented Apr 2, 2023 •

edited

Loading

rickardp commented Apr 3, 2023 •

edited

Loading

benjaminhuo commented Aug 21, 2023 •

edited

Loading

rajneesh-git commented Dec 28, 2024 •

edited

Loading

reneleonhardt commented Apr 8, 2025 •

edited

Loading