Could not load model meta-llama/Llama-2-7b-chat-hf with any of the following classes

import transformers
from transformers import AutoTokenizer
import torch

model = “meta-llama/Llama-2-7b-chat-hf”
#model = “meta-llama/Llama-2-70b-chat-hf”

tokenizer = AutoTokenizer.from_pretrained(model)

pipeline = transformers.pipeline(
“text-generation”,
model=model,
torch_dtype=torch.float16,
device_map=“auto”,
low_cpu_mem_usage=True,
)

sequences = pipeline(
‘I liked “Breaking Bad” and “Band of Brothers”. Do you have any recommendations of other shows I might like?\n’,
do_sample=True,
top_k=10,
num_return_sequences=1,
eos_token_id=tokenizer.eos_token_id,
max_length=200,
)
for seq in sequences:
print(f"Result: {seq[‘generated_text’]}")

ValueError: Could not load model meta-llama/Llama-2-7b-chat-hf with any of the following classes: (<class ‘transformers.models.auto.modeling_auto.AutoModelForCausalLM’>, <class ‘transformers.models.auto.modeling_tf_auto.TFAutoModelForCausalLM’>, <class ‘transformers.models.llama.modeling_llama.LlamaForCausalLM’>).

6 Likes

Did you find a solution? I’m facing the same problem

Same here. I’ll be happy to learn how to solve this :llama:

I believe it fails when executing this line:

pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    torch_dtype=torch.float16,
    device_map="auto",
)

Hi, was this resolved? would appreciate if any assistance could be provided

Hello, I’m facing a similar issue running the 7b model using transformer pipelines as it’s outlined in this blog post. Hopefully there will be a fix soon.

Hi,

This error might occur when you don’t have PyTorch or TensorFlow installed: python - Transformers model from Hugging-Face throws error that specific classes couldn t be loaded - Stack Overflow

Hey all! I was able to reproduce the error you have when using only CPU in Google Colab. After switching to GPU-powered Colab (even free, T4), things work properly.

3 Likes

It turns out there was a bug in Accelerate which has now been fixed.

Make sure to do pip install -U git+https://github.com/huggingface/accelerate.git if you’re running on CPU. But it’s advised to run on at least one GPU.

4 Likes

now a new error \torch\nn\modules\linear.py", line 114, in forward
return F.linear(input, self.weight, self.bias)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: “addmm_impl_cpu_” not implemented for ‘Half’

fixed by changing float16 to float32…16 is for gpu 32 works for cpu but slow asf to produce output.

The error is due to less disk space availability

1 Like

I updated accelerate and now I get the following error:

RuntimeError: MPS does not support cumsum op with int64 input

I also get the suggestion to install xformers, but that doesn’t work either. I get the error:

Backend subprocess exited when trying to invoke get_requires_for_build_wheel

  Traceback (most recent call last):
    File "/opt/homebrew/Cellar/poetry/1.5.1/libexec/lib/python3.11/site-packages/pyproject_hooks/_in_process/_in_process.py", line 353, in <module>
      main()
    File "/opt/homebrew/Cellar/poetry/1.5.1/libexec/lib/python3.11/site-packages/pyproject_hooks/_in_process/_in_process.py", line 335, in main
      json_out['return_val'] = hook(**hook_input['kwargs'])
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/opt/homebrew/Cellar/poetry/1.5.1/libexec/lib/python3.11/site-packages/pyproject_hooks/_in_process/_in_process.py", line 118, in get_requires_for_build_wheel
      return hook(config_settings)
             ^^^^^^^^^^^^^^^^^^^^^
    File "/private/var/folders/y6/skpcb0d11fb0yknzv6934h480000gn/T/tmpp_eso2ta/.venv/lib/python3.11/site-packages/setuptools/build_meta.py", line 341, in get_requires_for_build_wheel
      return self._get_build_requires(config_settings, requirements=['wheel'])
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/private/var/folders/y6/skpcb0d11fb0yknzv6934h480000gn/T/tmpp_eso2ta/.venv/lib/python3.11/site-packages/setuptools/build_meta.py", line 323, in _get_build_requires
      self.run_setup()
    File "/private/var/folders/y6/skpcb0d11fb0yknzv6934h480000gn/T/tmpp_eso2ta/.venv/lib/python3.11/site-packages/setuptools/build_meta.py", line 488, in run_setup
      self).run_setup(setup_script=setup_script)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/private/var/folders/y6/skpcb0d11fb0yknzv6934h480000gn/T/tmpp_eso2ta/.venv/lib/python3.11/site-packages/setuptools/build_meta.py", line 338, in run_setup
      exec(code, locals())
    File "<string>", line 23, in <module>
  ModuleNotFoundError: No module named 'torch'


  at /opt/homebrew/Cellar/poetry/1.5.1/libexec/lib/python3.11/site-packages/poetry/installation/chef.py:147 in _prepare
      143│
      144│                 error = ChefBuildError("\n\n".join(message_parts))
      145│
      146│             if error is not None:
    → 147│                 raise error from None
      148│
      149│             return path
      150│
      151│     def _prepare_sdist(self, archive: Path, destination: Path | None = None) -> Path:

Note: This error originates from the build backend, and is likely not a problem with poetry but with xformers (0.0.20) not supporting PEP 517 builds. You can verify this by running 'pip wheel --use-pep517 "xformers (==0.0.20)"'.

I’m using poetry.

I have enough free space, so that’s not the problem in my case.

It looks like you need to also install PyTorch

I was able to fix the error: RuntimeError: MPS does not support cumsum op with int64 input
by running the following command:
pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu

1 Like

I got the same issue and solved it by releasing more VRAM for it. So, it seems there could be multiple reasons under the hood for this error. Not enough disk space, not enough GPU memory, PyTorch not installed etc. … j

It seems like in my case, the issue is with torch. I’m setting up the environment using the following TOML:

[tool.poetry]
name = "llama-shell"
version = "0.1.0"
description = ""
authors = ["Dror Atariah <drorata@gmail.com>"]
readme = "README.md"
packages = [{ include = "llama_shell" }]

[tool.poetry.dependencies]
python = "3.9.13"
transformers = "^4.31.0"
huggingface-hub = { version = "^0.16.4", extras = ["cli,torch"] }
accelerate = { git = "https://github.com/huggingface/accelerate.git" }


[tool.poetry.group.dev.dependencies]
ipykernel = "^6.24.0"


[[tool.poetry.source]]
name = "pytorch-night"
url = "https://download.pytorch.org/whl/nightly/cpu"
priority = "explicit"

[build-system]
requires = ["poetry-core"]
build-backend = "poetry.core.masonry.api"

Then, I run:

poetry env use ~/.pyenv/versions/3.9.13/bin/python3  # Pinning this version due to [1]
poetry lock
poetry install

Then, when running poetry run python tutorial_test.py I get the error RuntimeError: MPS does not support cumsum op with int64 input.

I tried to switch to a nightly build of torch by running poetry add --source pytorch-night torch torchvision torchaudio. But I get the following error:

Package operations: 3 installs, 1 update, 0 removals

  • Installing pillow (10.0.0)
  • Updating torch (2.0.1 -> 2.1.0.dev20230801+cpu): Failed

  RuntimeError

  Unable to find installation candidates for torch (2.1.0.dev20230801+cpu)

[1] Unable to find installation candidates for torch (1.13.1) · Issue #2991 · langchain-ai/langchain · GitHub

I had the same problem, the only way I was able to fix it was instead to use the CUDA version of torch (the preview Nightly with CUDA 12.1 worked with my 12.2). If you use the GPU you are able to prevent this issue and follow up issues after installing xformers, which leads me to believe that perhaps using the CPU for this is just not viable.

Do we already have a solution for this issue? I was able to run the llama-2-13b-chat-hf for a week or so. But for some reason, I got this error today that the model can’t be loaded. I didn’t change anything in the code or the virtual env though.

ValueError: Could not load model meta-llama/Llama-2-13b-chat-hf with any of the following classes: (<class ‘transformers.models.auto.modeling_auto.AutoModelForCausalLM’>, <class ‘transformers.models.llama.modeling_llama.LlamaForCausalLM’>).

do not use transformers.pipeline

here you go:
import torch
from transformers import pipeline

device = torch.device(‘cuda’)
pipeline = pipeline(“text-generation”,
model=model,
tokenizer=tokenizer,
torch_dtype=torch.float16,
device = device
)