Loading a safetensors format model using Hugging Face Transformers

I try to load the ‘notstoic/pygmalion-13b-4bit-128g’ model using Hugging Face’s Transformers library. I am encountering an issue when trying to load the model, which is saved in the new safetensors format.

Here’s the code I’m using:

from transformers import LlamaForCausalLM, LlamaTokenizer

tokenizer = LlamaTokenizer.from_pretrained("path/to/model")
model = LlamaForCausalLM.from_pretrained("path/to/model", use_safetensors=True)

However, this code results in the following error:

Traceback (most recent call last):
  File "/Users/maxhager/Projects2023/nsfw/model_run.py", line 4, in <module>
    model = LlamaForCausalLM.from_pretrained("path/to/model", use_safetensors=True)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/maxhager/.virtualenvs/nsfw/lib/python3.11/site-packages/transformers/modeling_utils.py", line 2449, in from_pretrained
    raise EnvironmentError(
OSError: Error no file named pytorch_model.bin, tf_model.h5, model.ckpt.index or flax_model.msgpack found in directory path/to/model.

I’m confused by this error because I’ve set use_safetensors=True, as the model is stored in safetensors format. In the model directory (path/to/model), I have the following files:

  • 4bit-128g.safetensors
  • config.json
  • generation_config.json
  • pytorch_model.bin.index.json
  • special_tokens_map.json
  • tokenizer.json
  • tokenizer.model
  • tokenizer_config.json

It seems like the from_pretrained() function is not recognizing the safetensors format and instead is looking for the typical file formats (pytorch_model.bin, tf_model.h5, etc).

I would appreciate if anyone could provide guidance on why this is happening and how I can successfully load this model.

Any luck? I am considering using this format

send me a DM on twitter if that is something you are still caring about @MaxHager66