Error loading Llama model

I got a problem loading the Llama-2-13b-hf model with the following code,

LlamaForCausalLM.from_pretrained(base_model, trust_remote_code=True, device_map = “cuda:0”, load_in_8bit = True),

A error returned as
ImportError: Using load_in_8bit=True requires Accelerate: pip install accelerate and the latest version of bitsandbytes pip install -i https://test.pypi.org/simple/ bitsandbytes or pip install bitsandbytes`.

But I am pretty sure the two package are installed with version bitsandbytes-0.41.3, accelerate 0.25.0. Anyone has encountered the issue before and know how to resolve it, thank you very much!

Hi,

Are you running in Google Colab? If yes, restarting the runtime may help.

Hi, nielsr, I am running it in AWS sagemaker, I found a solution in forum by downgrading transformers to 4.30.0, the error disappears but a new error emerge: SafetensorError: Error while deserializing header: HeaderTooLarge, wonder if you have any clue on that, thank you!

It should also work with the latest Transformers version. Could you try creating a new environment?

Yes, latest version of transformers also pass that error, thank you very much!