TypeError: LlamaForCausalLM.__init__() got an unexpected keyword argument 'load_in_4bit'

I’m trying to test the new QLoRA model (guanaco-7b) locally but I’m facing an error loading the Llama model.

This is the code to load the model:

# Load the model.
# Note: It can take a while to download LLaMA and add the adapter modules.
# You can also use the 13B model by loading in 4bits.

import torch
from peft import PeftModel    
from transformers import AutoModelForCausalLM, AutoTokenizer, LlamaTokenizer, StoppingCriteria, StoppingCriteriaList, TextIteratorStreamer

model_name = "decapoda-research/llama-7b-hf"
adapters_name = 'timdettmers/guanaco-7b'

print(f"Starting to load the model {model_name} into memory")

m = AutoModelForCausalLM.from_pretrained(
    model_name,
    load_in_4bit=True,
    torch_dtype=torch.bfloat16,
    device_map={"": 0}
)
m = PeftModel.from_pretrained(m, adapters_name)
#m = m.merge_and_unload()
tok = LlamaTokenizer.from_pretrained(model_name)
tok.bos_token_id = 1

stop_token_ids = [0]

print(f"Successfully loaded the model {model_name} into memory")

And this is the error that I face:

TypeError: LlamaForCausalLM.__init__() got an unexpected keyword argument 'load_in_4bit'

The funny thing is that the same code works when I run it in Colab. So I thought it must be a version issue. But the transformer’s version in both cases are:

import transformers
print(transformers.__version__)

"4.30.0.dev0"

In both cases, the packages are installed like this:

# Install latest bitsandbytes & transformers, accelerate from source
!pip install -q -U bitsandbytes
!pip install -q -U git+https://github.com/huggingface/transformers.git
!pip install -q -U git+https://github.com/huggingface/peft.git
!pip install -q -U git+https://github.com/huggingface/accelerate.git
# Other requirements for the demo
!pip install gradio
!pip install sentencepiece

But still, for some reason Colab can run the code while it fails locally!

If you like to give it a try yourself, this is the link to the Colab:

https://colab.research.google.com/drive/17XEqL1JcmVWjHkT-WczdYkJlNINacwG7?usp=sharing

And this is the repo that I got the link from:

2 Likes
# Load the model.
# Note: It can take a while to download LLaMA and add the adapter modules.
# You can also use the 13B model by loading in 4bits.

import torch
from peft import PeftModel    
from transformers import AutoModelForCausalLM, AutoTokenizer, LlamaTokenizer, StoppingCriteria, StoppingCriteriaList, TextIteratorStreamer, BitsAndBytesConfig
from torch import cuda, bfloat16

model_name = "decapoda-research/llama-13b-hf"
adapters_name = 'timdettmers/guanaco-13b'

print(f"Starting to load the model {model_name} into memory")

m = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=BitsAndBytesConfig(
                load_in_4bit=True,
                bnb_4bit_compute_dtype=torch.bfloat16,
                bnb_4bit_use_double_quant=True,
                bnb_4bit_quant_type='nf4'
            ),    torch_dtype=torch.bfloat16,
    device_map={"": 0}
)
m = PeftModel.from_pretrained(m, adapters_name)
#m = m.merge_and_unload()
tok = LlamaTokenizer.from_pretrained(model_name)
tok.bos_token_id = 1

stop_token_ids = [0]

print(f"Successfully loaded the model {model_name} into memory")
1 Like

Thanks, @rhamnett . While your solution is technically correct and it works but it does not quantize the model itself. And as the result, my machine runs out of vRAM. Basically, your solution does not use QLoRA while using it is the whole point.

To reiterate, load_in_4bit=True must be part of the from_pretrained() function call arguments or the model is not quantized and the GPU will run out of memory. It’s just that I don’t understand why sometimes that argument is not part of the function contract while some other times it is. And in both cases, the version number is matching! It does not make sense to me at all.

I have managed to run my version of the code (mentioned in the question) in a docker container without running out of the GPU memory. But the same code still fails to load the model on my OS. It’s a very annoying error.

try rebuilding from source:

	pip install -q -U bitsandbytes
	pip install -q -U git+https://github.com/huggingface/transformers.git
	pip install -q -U git+https://github.com/huggingface/peft.git
	pip install -q -U git+https://github.com/huggingface/accelerate.git

I also am having the same issue. Fails locally (Mac M2 max), run in colab.

!pip install -q -U bitsandbytes
!pip install -q -U git+https://github.com/huggingface/transformers.git
!pip install -q -U git+https://github.com/huggingface/peft.git
!pip install -q -U git+https://github.com/huggingface/accelerate.git
from transformers import AutoModel, AutoTokenizer

model_id = "sentence-transformers/multi-qa-mpnet-base-cos-v1"

model = AutoModel.from_pretrained(model_id, load_in_4bit=True)
tokenizer = AutoTokenizer.from_pretrained(model_id)
- peft @ git+https://github.com/huggingface/peft.git@189a6b8e357ecda05ccde13999e4c35759596a67
- transformers @ git+https://github.com/huggingface/transformers.git@deff5979fee1f989d26e4946c92a5c35ce695af8
- accelerate @ git+https://github.com/huggingface/accelerate.git@665d5180fcc01d5700f7a9aa3f9bdb75c6055dce
- bitsandbytes==0.39.0

I do have sentence-transformers installed as well

1 Like

Oh nevermind. Not compatible.

Note that this method is only compatible with GPUs, hence it is not possible to quantize models in 4bit on a CPU.

1 Like

I tried the above code in my setup. The BitsAndByteConfig and the rest of the classes itself not getting imported. Below is the transformers library version

Name: transformers
Version: 4.30.2
Summary: State-of-the-art Machine Learning for JAX, PyTorch and TensorFlow
Home-page: https://github.com/huggingface/transformers
Author: The Hugging Face team (past and future) with the help of all our contributors (https://github.com/huggingface/transformers/graphs/contributors)
Author-email: transformers@huggingface.co
License: Apache 2.0 License
Location: /home/user/jupyter_env/lib/python3.11/site-packages
Requires: filelock, huggingface-hub, numpy, packaging, pyyaml, regex, requests, safetensors, tokenizers, tqdm
Required-by: peft, sentence-transformers

The bitsandbytes version details are shown below

Name: bitsandbytes
Version: 0.40.1.post1
Summary: k-bit optimizers and matrix multiplication routines.
Home-page: https://github.com/TimDettmers/bitsandbytes
Author: Tim Dettmers
Author-email: dettmers@cs.washington.edu
License: MIT
Location: /home/kamal/jupyter_env/lib/python3.11/site-packages
Requires: 
Required-by:

The error that occurs is show below

---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
Cell In[16], line 1
----> 1 from transformers import AutoModelForCausalLM, AutoTokenizer, StoppingCriteria, StoppingCriteriaList, BitsAndBytesConfig
      2 from torch import cuda, bfloat16

ImportError: cannot import name 'BitsAndBytesConfig' from 'transformers' (/home/user/jupyter_env/lib/python3.11/site-packages/transformers/__init__.py)

The code is in jupyter notebook, and its running inside virtual environment. Unable to figure out the cause of the issue

What is that I am missing? Could you please help

Is there a way to push the adapter model using the .push_to_hub method? I’m curious about why the TypeError occurred when I installed the GitHub version of the package.