BitsandBytes conflict with Accelerate

tyrleng · April 14, 2025, 6:17am

I’m running inference on a custom VLM derived model. Inference works fine when using the weights in their bfloat16 precision. However, when I try defining a BitsandBytes config, I receive errors that I suspect is due to conflicts between BitsandBytes and Accelerate, where Accelerate and BitsandBytes are both trying to set the compute device and hence generating the following stack trace.

Traceback (most recent call last):
  File "/home/tyr/RobotAI/openvla/scripts/extern/verify_prismatic.py", line 147, in <module>
    verify_prismatic()
  File "/home/tyr/miniforge3/envs/openvla/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/tyr/RobotAI/openvla/scripts/extern/verify_prismatic.py", line 97, in verify_prismatic
    vlm = AutoModelForVision2Seq.from_pretrained(
  File "/home/tyr/miniforge3/envs/openvla/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 563, in from_pretrained
    return model_class.from_pretrained(
  File "/home/tyr/miniforge3/envs/openvla/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3735, in from_pretrained
    dispatch_model(model, **device_map_kwargs)
  File "/home/tyr/miniforge3/envs/openvla/lib/python3.10/site-packages/accelerate/big_modeling.py", line 499, in dispatch_model
    model.to(device)
  File "/home/tyr/miniforge3/envs/openvla/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2670, in to
    raise ValueError(
ValueError: `.to` is not supported for `4-bit` or `8-bit` bitsandbytes models. Please use the model as it is, since the model has already been set to the correct devices and casted to the correct `dtype`.

This is the code that generated the above stack trace:

    vlm = AutoModelForVision2Seq.from_pretrained(
        MODEL_PATH,
        attn_implementation="flash_attention_2",
        torch_dtype=torch.float16,
        quantization_config=BitsAndBytesConfig(load_in_4bit=True),
        low_cpu_mem_usage=True,
        trust_remote_code=True,
    )

I’ve checked that the model is not being moved with a .to() within my code, I’ve tried adding device_map=None, tried setting torch_dtype=auto, but none of them resolve the issue.

Has anyone encountered this error before or have some suggestions about what might be going wrong? Thanks!

John6666 · April 14, 2025, 7:19am

Hmm… How about device_map=“cuda” or device_map=“sequencial”?

github.com/huggingface/transformers

Issue Loading 4-bit and 8-bit language models: ValueError: `.to` is not supported for `4-bit` or `8-bit` models. Please use the model as it is, since the model has already been set to the correct devices and casted to the correct `dtype`.

opened 06:07AM - 28 Jun 23 UTC

closed 04:21PM - 11 Oct 24 UTC

DJT777

### System Info ### System Info I'm running into an issue where I'm not able… to load a 4-bit or 8-bit quantized version of Falcon or LLaMa models. This was working a couple of weeks ago. This is running on Colab. I'm wondering if anyone knows of a fix, or why this is no longer working when it was 2-3 weeks ago around June 8th. - `transformers` version: 4.31.0.dev0 - Platform: Linux-5.15.107+-x86_64-with-glibc2.31 - Python version: 3.10.12 - Huggingface_hub version: 0.15.1 - Safetensors version: 0.3.1 - PyTorch version (GPU?): 2.0.1+cu118 (True) - Tensorflow version (GPU?): 2.12.0 (True) - Flax version (CPU?/GPU?/TPU?): 0.6.11 (gpu) - Jax version: 0.4.10 - JaxLib version: 0.4.10 ### Who can help? @ArthurZucker @younesbelkada @sgugger ### Information - [X] The official example scripts - [X] My own modified scripts ### Tasks - [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...) - [X] My own task or dataset (give details below) ### Reproduction Running in Colab on an A100 in Colab PRro ``` !pip install git+https://www.github.com/huggingface/transformers !pip install git+https://github.com/huggingface/accelerate !pip install bitsandbytes !pip install einops from transformers import AutoModelForCausalLM, AutoConfig, AutoTokenizer import torch model_path="tiiuae/falcon-40b-instruct" config = AutoConfig.from_pretrained(model_path, trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True, load_in_4bit=True, device_map="auto") tokenizer = AutoTokenizer.from_pretrained("tiiuae/falcon-40b-instruct") input_text = "Describe the solar system." input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda") outputs = model.generate(input_ids, max_length=100) print(tokenizer.decode(outputs[0])) ``` Cell output: ``` Collecting git+https://www.github.com/huggingface/transformers Cloning https://www.github.com/huggingface/transformers to /tmp/pip-req-build-6pyatvel Running command git clone --filter=blob:none --quiet https://www.github.com/huggingface/transformers /tmp/pip-req-build-6pyatvel warning: redirecting to https://github.com/huggingface/transformers.git/ Resolved https://www.github.com/huggingface/transformers to commit e84bf1f734f87aa2bedc41b9b9933d00fc6add98 Installing build dependencies ... done Getting requirements to build wheel ... done Preparing metadata (pyproject.toml) ... done Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from transformers==4.31.0.dev0) (3.12.2) Collecting huggingface-hub<1.0,>=0.14.1 (from transformers==4.31.0.dev0) Downloading huggingface_hub-0.15.1-py3-none-any.whl (236 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 236.8/236.8 kB 11.6 MB/s eta 0:00:00 Requirement already satisfied: numpy>=1.17 in /usr/local/lib/python3.10/dist-packages (from transformers==4.31.0.dev0) (1.22.4) Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.10/dist-packages (from transformers==4.31.0.dev0) (23.1) Requirement already satisfied: pyyaml>=5.1 in /usr/local/lib/python3.10/dist-packages (from transformers==4.31.0.dev0) (6.0) Requirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.10/dist-packages (from transformers==4.31.0.dev0) (2022.10.31) Requirement already satisfied: requests in /usr/local/lib/python3.10/dist-packages (from transformers==4.31.0.dev0) (2.27.1) Collecting tokenizers!=0.11.3,<0.14,>=0.11.1 (from transformers==4.31.0.dev0) Downloading tokenizers-0.13.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.8 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 7.8/7.8 MB 114.2 MB/s eta 0:00:00 Collecting safetensors>=0.3.1 (from transformers==4.31.0.dev0) Downloading safetensors-0.3.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.3/1.3 MB 79.9 MB/s eta 0:00:00 Requirement already satisfied: tqdm>=4.27 in /usr/local/lib/python3.10/dist-packages (from transformers==4.31.0.dev0) (4.65.0) Requirement already satisfied: fsspec in /usr/local/lib/python3.10/dist-packages (from huggingface-hub<1.0,>=0.14.1->transformers==4.31.0.dev0) (2023.6.0) Requirement already satisfied: typing-extensions>=3.7.4.3 in /usr/local/lib/python3.10/dist-packages (from huggingface-hub<1.0,>=0.14.1->transformers==4.31.0.dev0) (4.6.3) Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests->transformers==4.31.0.dev0) (1.26.16) Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests->transformers==4.31.0.dev0) (2023.5.7) Requirement already satisfied: charset-normalizer~=2.0.0 in /usr/local/lib/python3.10/dist-packages (from requests->transformers==4.31.0.dev0) (2.0.12) Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests->transformers==4.31.0.dev0) (3.4) Building wheels for collected packages: transformers Building wheel for transformers (pyproject.toml) ... done Created wheel for transformers: filename=transformers-4.31.0.dev0-py3-none-any.whl size=7228417 sha256=5867afa880111a40f7b630e51d9f1709ec1131236a31c2c7fb5f97179e3d1405 Stored in directory: /tmp/pip-ephem-wheel-cache-t06u3u6x/wheels/c1/ac/11/e69d454307e735e14f4f95e575c8be27fd99835ec36f504c13 Successfully built transformers Installing collected packages: tokenizers, safetensors, huggingface-hub, transformers Successfully installed huggingface-hub-0.15.1 safetensors-0.3.1 tokenizers-0.13.3 transformers-4.31.0.dev0 Collecting git+https://github.com/huggingface/accelerate Cloning https://github.com/huggingface/accelerate to /tmp/pip-req-build-76ziff6x Running command git clone --filter=blob:none --quiet https://github.com/huggingface/accelerate /tmp/pip-req-build-76ziff6x Resolved https://github.com/huggingface/accelerate to commit d141b4ce794227450a105b7281611c7980e5b3d6 Installing build dependencies ... done Getting requirements to build wheel ... done Preparing metadata (pyproject.toml) ... done Requirement already satisfied: numpy>=1.17 in /usr/local/lib/python3.10/dist-packages (from accelerate==0.21.0.dev0) (1.22.4) Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.10/dist-packages (from accelerate==0.21.0.dev0) (23.1) Requirement already satisfied: psutil in /usr/local/lib/python3.10/dist-packages (from accelerate==0.21.0.dev0) (5.9.5) Requirement already satisfied: pyyaml in /usr/local/lib/python3.10/dist-packages (from accelerate==0.21.0.dev0) (6.0) Requirement already satisfied: torch>=1.6.0 in /usr/local/lib/python3.10/dist-packages (from accelerate==0.21.0.dev0) (2.0.1+cu118) Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from torch>=1.6.0->accelerate==0.21.0.dev0) (3.12.2) Requirement already satisfied: typing-extensions in /usr/local/lib/python3.10/dist-packages (from torch>=1.6.0->accelerate==0.21.0.dev0) (4.6.3) Requirement already satisfied: sympy in /usr/local/lib/python3.10/dist-packages (from torch>=1.6.0->accelerate==0.21.0.dev0) (1.11.1) Requirement already satisfied: networkx in /usr/local/lib/python3.10/dist-packages (from torch>=1.6.0->accelerate==0.21.0.dev0) (3.1) Requirement already satisfied: jinja2 in /usr/local/lib/python3.10/dist-packages (from torch>=1.6.0->accelerate==0.21.0.dev0) (3.1.2) Requirement already satisfied: triton==2.0.0 in /usr/local/lib/python3.10/dist-packages (from torch>=1.6.0->accelerate==0.21.0.dev0) (2.0.0) Requirement already satisfied: cmake in /usr/local/lib/python3.10/dist-packages (from triton==2.0.0->torch>=1.6.0->accelerate==0.21.0.dev0) (3.25.2) Requirement already satisfied: lit in /usr/local/lib/python3.10/dist-packages (from triton==2.0.0->torch>=1.6.0->accelerate==0.21.0.dev0) (16.0.6) Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from jinja2->torch>=1.6.0->accelerate==0.21.0.dev0) (2.1.3) Requirement already satisfied: mpmath>=0.19 in /usr/local/lib/python3.10/dist-packages (from sympy->torch>=1.6.0->accelerate==0.21.0.dev0) (1.3.0) Building wheels for collected packages: accelerate Building wheel for accelerate (pyproject.toml) ... done Created wheel for accelerate: filename=accelerate-0.21.0.dev0-py3-none-any.whl size=234648 sha256=71b98a6d4b1111cc9ca22265f6699cd552325e5f71c83daebe696afd957497ee Stored in directory: /tmp/pip-ephem-wheel-cache-atmtszgr/wheels/f6/c7/9d/1b8a5ca8353d9307733bc719107acb67acdc95063bba749f26 Successfully built accelerate Installing collected packages: accelerate Successfully installed accelerate-0.21.0.dev0 Collecting bitsandbytes Downloading bitsandbytes-0.39.1-py3-none-any.whl (97.1 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 97.1/97.1 MB 18.8 MB/s eta 0:00:00 Installing collected packages: bitsandbytes Successfully installed bitsandbytes-0.39.1 Collecting einops Downloading einops-0.6.1-py3-none-any.whl (42 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 42.2/42.2 kB 3.8 MB/s eta 0:00:00 Installing collected packages: einops Successfully installed einops-0.6.1 Downloading (…)lve/main/config.json: 100% 658/658 [00:00<00:00, 51.8kB/s] Downloading (…)/configuration_RW.py: 100% 2.51k/2.51k [00:00<00:00, 227kB/s] A new version of the following files was downloaded from https://huggingface.co/tiiuae/falcon-40b-instruct: - configuration_RW.py . Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision. Downloading (…)main/modelling_RW.py: 100% 47.1k/47.1k [00:00<00:00, 3.76MB/s] A new version of the following files was downloaded from https://huggingface.co/tiiuae/falcon-40b-instruct: - modelling_RW.py . Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision. Downloading (…)model.bin.index.json: 100% 39.3k/39.3k [00:00<00:00, 3.46MB/s] Downloading shards: 100% 9/9 [04:40<00:00, 29.33s/it] Downloading (…)l-00001-of-00009.bin: 100% 9.50G/9.50G [00:37<00:00, 274MB/s] Downloading (…)l-00002-of-00009.bin: 100% 9.51G/9.51G [00:33<00:00, 340MB/s] Downloading (…)l-00003-of-00009.bin: 100% 9.51G/9.51G [00:28<00:00, 320MB/s] Downloading (…)l-00004-of-00009.bin: 100% 9.51G/9.51G [00:33<00:00, 317MB/s] Downloading (…)l-00005-of-00009.bin: 100% 9.51G/9.51G [00:27<00:00, 210MB/s] Downloading (…)l-00006-of-00009.bin: 100% 9.51G/9.51G [00:34<00:00, 180MB/s] Downloading (…)l-00007-of-00009.bin: 100% 9.51G/9.51G [00:27<00:00, 307MB/s] Downloading (…)l-00008-of-00009.bin: 100% 9.51G/9.51G [00:27<00:00, 504MB/s] Downloading (…)l-00009-of-00009.bin: 100% 7.58G/7.58G [00:27<00:00, 315MB/s] ===================================BUG REPORT=================================== Welcome to bitsandbytes. For bug reports, please run python -m bitsandbytes and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues ================================================================================ bin /usr/local/lib/python3.10/dist-packages/bitsandbytes/libbitsandbytes_cuda118.so CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths... CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so CUDA SETUP: Highest compute capability among GPUs detected: 8.0 CUDA SETUP: Detected CUDA version 118 CUDA SETUP: Loading binary /usr/local/lib/python3.10/dist-packages/bitsandbytes/libbitsandbytes_cuda118.so... /usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: /usr/lib64-nvidia did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths... warn(msg) /usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/sys/fs/cgroup/memory.events /var/colab/cgroup/jupyter-children/memory.events')} warn(msg) /usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('//172.28.0.1'), PosixPath('8013'), PosixPath('http')} warn(msg) /usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('//colab.research.google.com/tun/m/cc48301118ce562b961b3c22d803539adc1e0c19/gpu-a100-s-b20acq94qsrp --tunnel_background_save_delay=10s --tunnel_periodic_background_save_frequency=30m0s --enable_output_coalescing=true --output_coalescing_required=true'), PosixPath('--logtostderr --listen_host=172.28.0.12 --target_host=172.28.0.12 --tunnel_background_save_url=https')} warn(msg) /usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/env/python')} warn(msg) /usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('//ipykernel.pylab.backend_inline'), PosixPath('module')} warn(msg) /usr/local/lib/python3.10/dist-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] files: {PosixPath('/usr/local/cuda/lib64/libcudart.so'), PosixPath('/usr/local/cuda/lib64/libcudart.so.11.0')}.. We'll flip a coin and try one of these, in order to fail forward. Either way, this might cause trouble in the future: If you get `CUDA error: invalid device function` errors, the above might be the cause and the solution is to make sure only one ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] in the paths that we search based on your env. warn(msg) Loading checkpoint shards: 100% 9/9 [05:45<00:00, 35.83s/it] Downloading (…)neration_config.json: 100% 111/111 [00:00<00:00, 10.3kB/s] --------------------------------------------------------------------------- ValueError Traceback (most recent call last) [<ipython-input-1-c89997e10ae9>](https://localhost:8080/#) in <cell line: 15>() 13 14 config = AutoConfig.from_pretrained(model_path, trust_remote_code=True) ---> 15 model = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True, load_in_4bit=True, device_map="auto") 16 17 tokenizer = AutoTokenizer.from_pretrained("tiiuae/falcon-40b-instruct") 3 frames [/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py](https://localhost:8080/#) in to(self, *args, **kwargs) 1894 # Checks if the model has been loaded in 8-bit 1895 if getattr(self, "is_quantized", False): -> 1896 raise ValueError( 1897 "`.to` is not supported for `4-bit` or `8-bit` models. Please use the model as it is, since the" 1898 " model has already been set to the correct devices and casted to the correct `dtype`." ValueError: `.to` is not supported for `4-bit` or `8-bit` models. Please use the model as it is, since the model has already been set to the correct devices and casted to the correct `dtype`. ``` ### Expected behavior Model should be loaded and able to run inference.

tyrleng · April 14, 2025, 9:40am

I’ve tried setting device_map to cuda or sequential, no luck.

I checked the github issue that you mentioned and applied the code changes suggested by its associated pull request (my accelerate package is 1.5.1). This moved me past the value error, but I then encounted the next error

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument mat2 in method wrapper_CUDA_bmm)

This error is generated when I try calling the generate() method of the AutoModelForVision2Seq model.

 gen_ids = vlm.generate(**inputs, do_sample=False, min_length=1, max_length=512)

The processor is used as so

inputs = processor(prompt, image).to(device, dtype=torch.bfloat16)

I’ve been trying to toggle on-off the parameters in the model (e.g. turning off flash-attn, turning off low_cpu_mem_usage), but the two devices RuntimeError still persists. Right now, I still can’t identify which part of the weights or inputs or operation are being performed on the cpu.

John6666 · April 14, 2025, 9:50am

Hmm… How about this for debugging?

inputs = processor(prompt, image).to(device, dtype=torch.bfloat16)
print(inpus.device)
print(model.device)

tyrleng · April 14, 2025, 12:16pm

There wasn’t a inputs.device, but I did the following:

        for k, v in inputs.items():
            print(f"{k}: {v.device}")
        print(f"model device: {vlm.device}")

and the output is

input_ids: cuda:0
attention_mask: cuda:0
pixel_values: cuda:0
model device: cuda:0

John6666 · April 14, 2025, 12:43pm

Hmm… It worked.

from transformers import AutoModelForVision2Seq, AutoProcessor, BitsAndBytesConfig
from PIL import Image
import torch
from transformers.image_utils import load_image

device = "cuda" if torch.cuda.is_available() else "cpu"
dtype = torch.bfloat16

# Load Processor & VLA
processor = AutoProcessor.from_pretrained("openvla/openvla-7b", trust_remote_code=True)
vla = AutoModelForVision2Seq.from_pretrained(
    "openvla/openvla-7b",
    #attn_implementation="flash_attention_2", # [Optional] Requires `flash_attn`
    torch_dtype=dtype,
    #low_cpu_mem_usage=True,
    quantization_config=BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_use_double_quant=True, bnb_4bit_compute_dtype=dtype),
    trust_remote_code=True
).to(device)

image = load_image("https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg")
prompt = "In: What action should the robot take to {<INSTRUCTION>}?\nOut:"

inputs = processor(prompt, image).to(device, dtype=dtype)
action = vla.predict_action(**inputs, unnorm_key="bridge_orig", do_sample=False)
print(action)
#  x = F.scaled_dot_product_attention(
#[ 4.62235402e-04  6.69859354e-03  4.95526172e-03 -6.52482310e-03
#  9.93723747e-03  1.28276732e-02  9.96078431e-01]

# accelerate                1.0.1
# bitsandbytes              0.45.1
# torch                     2.4.0+cu124
# transformers              4.49.0.dev0

tyrleng · April 14, 2025, 1:50pm

Well I tried doing version matching one at a time, and it seems that the transformer version was the issue. The project fixed the transformer version at 4.40.1, but upgrading to 4.49.0 resolved the issues.

The above github issue didn’t seem to apply to this case, surprisingly enough. When I was version matching, I removed and reinstalled accelerate and so wiped out the edits I had done, but both stock versions of accelerate (1.5.1, which was what I was using, as well as 1.0.1) work just fine. The bitsandbytes version didn’t seem to matter either (0.45.5, which was mine, and 0.45.1, which is yours).

I don’t know what the difference between transformers 4.40.1 and 4.49.0 is, but it probably is something to do with how transformers orchestrates between the accelerate and bitesandbytes packages.

Also, thank you very much for your time and help!! I really appreciate it!

Topic		Replies	Views
AutoModelForCausalLM error with accelerate and bitsandbytes 🤗Accelerate	1	1494	April 15, 2024
Errors running Inference Endpoint with quantized model Inference Endpoints on the Hub	2	792	September 14, 2023
Bitsandbytes `has_fp16_weights` issue 🤗Transformers	1	173	August 15, 2024
ImportError in bitsandbytes with Accelerate Beginners	3	8297	April 4, 2024
Correct Usage of BitsAndBytesConfig 🤗Transformers	4	29797	March 18, 2023

BitsandBytes conflict with Accelerate

Related topics