An error i ve been trying to fix for days now

Martini007 · November 18, 2024, 5:31pm

Hi, I have been working with a code that downloads a model from the huggin face (you only specify its name) and then tunes it, trains it and evaluates. Everything seems fine up to the moment when the code actaully starts dowloading the model, this below error appears. I have been trying to download “unsloth/gemma-2b-it-bnb-4b” but I have the same problem with other models.
I have a correct transformers version, same wtih Bitsandbytes, this gemma model also has a config.json file. I am really lost here so any help is higly appreciated

John6666 · November 19, 2024, 1:27am

Hmm, I think it’s a problem with bitsandbytes, but the error content is different from these.
I just need to know the outline of the code in the loading section. Around from_pretrained and quantization_config.

github.com/bitsandbytes-foundation/bitsandbytes

4bit quantized model.dequantize() fails on CPU

opened 01:31PM - 07 Aug 24 UTC

npbool

### System Info ubuntu22.04, python3.10.4, intel cpu bitsandbytes==0.43.3 t…ransformers==4.43.3 ### Reproduction ``` quantization_config=BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_use_double_quant=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch.float16, ) base_model = AutoModelForCausalLM.from_pretrained( "Qwen/Qwen2-7B-Instruct", torch_dtype=torch.float16, device_map="cpu", quantization_config=quantization_config, ) base_model.dequantize() ``` Error: ``` --------------------------------------------------------------------------- AssertionError Traceback (most recent call last) Cell In[24], line 1 ----> 1 base_model.dequantize() File ~/projects/ml/venv/lib/python3.10/site-packages/transformers/modeling_utils.py:1394, in PreTrainedModel.dequantize(self) 1391 if hf_quantizer is None: 1392 raise ValueError("You need to first quantize your model in order to dequantize it") -> 1394 return hf_quantizer.dequantize(self) File ~/projects/ml/venv/lib/python3.10/site-packages/transformers/quantizers/base.py:202, in HfQuantizer.dequantize(self, model) 197 def dequantize(self, model): 198 """ 199 Potentially dequantize the model to retrive the original model, with some loss in accuracy / performance. 200 Note not all quantization schemes support this. 201 """ --> 202 model = self._dequantize(model) 204 # Delete quantizer and quantization config 205 del model.hf_quantizer File ~/projects/ml/venv/lib/python3.10/site-packages/transformers/quantizers/quantizer_bnb_4bit.py:320, in Bnb4BitHfQuantizer._dequantize(self, model) 317 def _dequantize(self, model): 318 from ..integrations import dequantize_and_replace --> 320 model = dequantize_and_replace( 321 model, self.modules_to_not_convert, quantization_config=self.quantization_config 322 ) 323 return model File ~/projects/ml/venv/lib/python3.10/site-packages/transformers/integrations/bitsandbytes.py:458, in dequantize_and_replace(model, modules_to_not_convert, quantization_config) 453 def dequantize_and_replace( 454 model, 455 modules_to_not_convert=None, 456 quantization_config=None, 457 ): --> 458 model, has_been_replaced = _dequantize_and_replace( 459 model, 460 modules_to_not_convert=modules_to_not_convert, 461 quantization_config=quantization_config, 462 ) 464 if not has_been_replaced: 465 logger.warning( 466 "For some reason the model has not been properly dequantized. You might see unexpected behavior." 467 ) File ~/projects/ml/venv/lib/python3.10/site-packages/transformers/integrations/bitsandbytes.py:441, in _dequantize_and_replace(model, modules_to_not_convert, current_key_name, quantization_config, has_been_replaced) 439 model._modules[name] = new_module 440 if len(list(module.children())) > 0: --> 441 _, has_been_replaced = _dequantize_and_replace( 442 module, 443 modules_to_not_convert, 444 current_key_name, 445 quantization_config, 446 has_been_replaced=has_been_replaced, 447 ) 448 # Remove the last key for recursion 449 current_key_name.pop(-1) File ~/projects/ml/venv/lib/python3.10/site-packages/transformers/integrations/bitsandbytes.py:441, in _dequantize_and_replace(model, modules_to_not_convert, current_key_name, quantization_config, has_been_replaced) 439 model._modules[name] = new_module 440 if len(list(module.children())) > 0: --> 441 _, has_been_replaced = _dequantize_and_replace( 442 module, 443 modules_to_not_convert, 444 current_key_name, 445 quantization_config, 446 has_been_replaced=has_been_replaced, 447 ) 448 # Remove the last key for recursion 449 current_key_name.pop(-1) [... skipping similar frames: _dequantize_and_replace at line 441 (1 times)] File ~/projects/ml/venv/lib/python3.10/site-packages/transformers/integrations/bitsandbytes.py:441, in _dequantize_and_replace(model, modules_to_not_convert, current_key_name, quantization_config, has_been_replaced) 439 model._modules[name] = new_module 440 if len(list(module.children())) > 0: --> 441 _, has_been_replaced = _dequantize_and_replace( 442 module, 443 modules_to_not_convert, 444 current_key_name, 445 quantization_config, 446 has_been_replaced=has_been_replaced, 447 ) 448 # Remove the last key for recursion 449 current_key_name.pop(-1) File ~/projects/ml/venv/lib/python3.10/site-packages/transformers/integrations/bitsandbytes.py:425, in _dequantize_and_replace(model, modules_to_not_convert, current_key_name, quantization_config, has_been_replaced) 422 else: 423 state = None --> 425 new_module.weight = torch.nn.Parameter(dequantize_bnb_weight(module.weight, state)) 427 if bias is not None: 428 new_module.bias = bias File ~/projects/ml/venv/lib/python3.10/site-packages/transformers/integrations/bitsandbytes.py:349, in dequantize_bnb_weight(weight, state) 346 return weight 348 if cls_name == "Params4bit": --> 349 output_tensor = bnb.functional.dequantize_4bit(weight.data, weight.quant_state) 350 logger.warning_once( 351 f"The model is going to be dequantized in {output_tensor.dtype} - if you want to upcast it to another dtype, make sure to pass the desired dtype when quantizing the model through `bnb_4bit_quant_type` argument of `BitsAndBytesConfig`" 352 ) 353 return output_tensor File ~/projects/ml/venv/lib/python3.10/site-packages/bitsandbytes/functional.py:1333, in dequantize_4bit(A, quant_state, absmax, out, blocksize, quant_type) 1330 raise NotImplementedError(f"4-bit quantization data type {quant_type} is not implemented.") 1332 if quant_state is None: -> 1333 assert absmax is not None and out is not None 1335 quant_state = QuantState( 1336 absmax=absmax, 1337 shape=out.shape, (...) 1340 quant_type=quant_type, 1341 ) 1343 else: AssertionError: ``` ### Expected behavior This code should work fine on cpu as on nvidia gpu.

Martini007 · November 19, 2024, 1:51am

Hi thank you so much for help!

Possibly bitsandbytes could be an issue, but i am not sure how since i downloaded the version specified in requirements from the og repo.

the code is from this repository: https://github.com/kdu4108/context-vs-prior-finetuning and i think the code you refer to is in utils.py but I am not sure. Lmk if thats what you meant?

the line 98:

John6666 · November 19, 2024, 2:04am

github.com

kdu4108/context-vs-prior-finetuning/blob/main/model_utils/utils.py#L98


      
                      quantization_config=bnb_config,
                      device_map=device,
                      torch_dtype=dtype,
                      attn_implementation=attn_implementation,
                  )
                  tokenizer = prepare_tokenizer(model, padding_side=padding_side)
                  if train_mode:
                      # If we are not training the model, we do not want to load it in peft mode
                      model = prepare_peft_model(model, peft_config=peft_config)
          else:
              model = AutoModelForCausalLM.from_pretrained(
                  model_id,
                  quantization_config=bnb_config,
                  device_map=device,
                  torch_dtype=dtype,
                  attn_implementation=attn_implementation,
              )
              tokenizer = prepare_tokenizer(model, padding_side=padding_side)
          print(f"Loaded model on device {model.device} with dtype {model.dtype}.")
          
          torch.cuda.empty_cache()

I see, it was the code in the library. I think I’ve found the cause. If you don’t pass both load_in_4bit=True and load_in_8bit=True to load_model_and_tokenizer(), quantization_config=None will be set, and the above error will occur.
This should be avoided by the library.

github.com

kdu4108/context-vs-prior-finetuning/blob/main/model_utils/utils.py#L70


      
                  bnb_4bit_use_double_quant=True,
                  bnb_4bit_quant_type="nf4",
                  bnb_4bit_compute_dtype=torch.bfloat16,
              )
          elif load_in_8bit:
              # TODO(kdu): untested
              bnb_config = BitsAndBytesConfig(
                  load_in_8bit=True,
              )
          else:
              bnb_config = None
          
          if peft_config is not None or try_load_as_peft:
              try:
                  model = AutoPeftModelForCausalLM.from_pretrained(
                      model_id,
                      is_trainable=train_mode,
                      config=peft_config,
                      quantization_config=bnb_config,
                      device_map=device,
                      torch_dtype=dtype,

Martini007 · November 19, 2024, 7:27pm

Thanks for pointing things out! I actually lacked on of the argument that should have been passed to the main.py function but everything works now. Cheers!

Topic		Replies	Views
Error Debugging Beginners	1	21	April 29, 2025
BitsAndBytes With DDP 🤗Transformers	3	88	October 7, 2024
BitsAndBytes transformers issue 🤗Transformers	1	2436	September 15, 2023
SmolVLM 8bit Quantization Problem Models	3	476	November 29, 2024
Qlora - 8 bit quantization using bitsandbytes gives error for owl-vit model Intermediate	1	493	April 12, 2024

An error i ve been trying to fix for days now

Related topics