How can I set `max_memory` parameter while loading Quantized model with Model Pipeline class?

Thanks for your kind response! :man_bowing:
Unfortunately, downgrading the transformers version to 4.49.0 didn’t work for my case😭
It still shows the same error as below:

File "~/anaconda3/envs/sample_env/lib/python3.11/site-packages/bitsandbytes/nn/modules.py", line 464, in _save_to_state_dict
    for k, v in self.weight.quant_state.as_dict(packed=True).items():
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~/anaconda3/envs/sample_env/lib/python3.11/site-packages/bitsandbytes/functional.py", line 810, in as_dict
    "nested_offset": self.offset.item(),
                     ^^^^^^^^^^^^^^^^^^
NotImplementedError: aten::_local_scalar_dense: attempted to run this operator with Meta tensors, but there was no abstract impl or Meta kernel registered. You may have run into this message while using an operator with PT2 compilation APIs (torch.compile/torch.export); in order to use this operator with those APIs you'll need to add an abstract impl.Please see the following doc for next steps: https://docs.google.com/document/d/1_W62p8WJOQQUzPsJYa7s701JXt0qf2OfLub2sbkHOaU/edit

However, as you suggested, it works properly with unsloth/Qwen2.5-VL-3B-Instruct-unsloth-bnb-4bit!
It might be related to the weights.

Thank you again for your kind reply, and I hope you have a great day! :man_bowing:


PS: If anyone else encounters a problem similar to mine, I hope this information helps, so I’m sharing my environment below.

  • Python 3.11.11
  • CUDA 12.1
  • torch==2.3.1+cu121
  • torchvision==0.18.1+cu121
  • accelerate==1.5.2
  • bitsandbytes==0.45.3
  • flash-attn==2.7.3
  • transformers==4.49.0
1 Like