Error when quantization codellama 70b

UnauthorizedShelley · June 19, 2024, 6:59am

When I use bitsandbytes to quantize codellama 70b I occurred error:
my code is:

MODEL_NAME = 'codellama/CodeLlama-70b-hf'
bnb_config = BitsAndBytesConfig(
       load_in_4bit=True,
       bnb_4bit_quant_type="nf4",
       bnb_4bit_compute_dtype=torch.float16,
   )

model = AutoModelForCausalLM.from_pretrained(
   MODEL_NAME,
   use_safetensors=True,
   quantization_config=bnb_config,
   trust_remote_code=True,
   device_map="auto",
)

UnauthorizedShelley · June 19, 2024, 7:00am

And this is the error

UnauthorizedShelley · June 19, 2024, 7:23am

the other part.

RaushanTurganbay · June 20, 2024, 6:54am

Hi! The error msg says that you don’t have enough GPU memory to load the 70B model. To see the workaround with cpu-offloading, you can follow the link from error message

Topic		Replies	Views
Qlora - 8 bit quantization using bitsandbytes gives error for owl-vit model Intermediate	1	493	April 12, 2024
Bitsandbytes quantization and QLORA fine-tuning 🤗Transformers	1	271	November 5, 2024
An error i ve been trying to fix for days now Intermediate	4	434	November 19, 2024
valueError: Supplied state dict for layers does not contain `bitsandbytes__*` and possibly other `quantized_stats`(when load saved quantized model) 🤗Transformers	4	789	May 30, 2025
Is this needed: bnb 4bit use double quant = True? Beginners	3	2573	March 7, 2025

Error when quantization codellama 70b

Related topics