Error when quantization codellama 70b

When I use bitsandbytes to quantize codellama 70b I occurred error:
my code is:

MODEL_NAME = 'codellama/CodeLlama-70b-hf'
bnb_config = BitsAndBytesConfig(
       load_in_4bit=True,
       bnb_4bit_quant_type="nf4",
       bnb_4bit_compute_dtype=torch.float16,
   )

model = AutoModelForCausalLM.from_pretrained(
   MODEL_NAME,
   use_safetensors=True,
   quantization_config=bnb_config,
   trust_remote_code=True,
   device_map="auto",
)


And this is the error


the other part.

Hi! The error msg says that you don’t have enough GPU memory to load the 70B model. To see the workaround with cpu-offloading, you can follow the link from error message :slight_smile: