Run Mistral model only on CPU

mLacht · March 6, 2024, 6:48am

Hey there,
unfortunately I do not have a GPU. I want to run the mistral code just on my CPU. I have a RAG project and all works just fine except the following mistral part:

from transformers import DPRContextEncoder, DPRContextEncoderTokenizer
import torch
import faiss   # used for indexing pip install faiss-cpu
from transformers import (RagRetriever,
                          RagSequenceForGeneration,
                          RagTokenizer)
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
.....
print("Mistral Models")
model_name_or_path = "TheBloke/Mistral-7B-Instruct-v0.1-GPTQ"
# To use a different branch, change revision
# For example: revision="gptq-4bit-32g-actorder_True"
tokenizer_2 = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)

print("HUUUHUUUU")
# # # removed cude here:                                              device_map="cuda:0",
model_2 = AutoModelForCausalLM.from_pretrained(model_name_or_path,
                                             trust_remote_code=False,
                                             revision="gptq-4bit-32g-actorder_True")
# Save the Mistral model and tokenizer
print("saving mistral now....")
model_2.save_pretrained("mistralModel")
tokenizer_2.save_pretrained("mistralTokenizer")

The error I get is:

.....
Mistral Models
HUUUHUUUU
CUDA extension not installed.
CUDA extension not installed.
Traceback (most recent call last):
  File ".....\test.py", line 263, in <module>
    model_2 = AutoModelForCausalLM.from_pretrained(model_name_or_path,
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".....\Python311\Lib\site-packages\transformers\models\auto\auto_factory.py", line 566, in from_pretrained
    return model_class.from_pretrained(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".....\Python311\Lib\site-packages\transformers\modeling_utils.py", line 3928, in from_pretrained
    model = quantizer.post_init_model(model)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".....Python311\Lib\site-packages\optimum\gptq\quantizer.py", line 587, in post_init_model
    raise ValueError(
ValueError: Found modules on cpu/disk. Using Exllama or Exllamav2 backend requires all the modules to be on GPU.You can deactivate exllama backend by setting `disable_exllama=True` in the quantization 
config object

I tried already many diffrent things but I could not fix this issue I would be pleased by any kind of help.
Thx Markus

Topic		Replies	Views
Running Mistral-7B-Instruct-v0.2 on multiple GPUs Beginners	4	4297	March 13, 2024
How can I make use of GPU manually to run inference faster? 🤗Transformers	3	34	April 22, 2025
Run pre-trained LLM model on CPU - ValueError: Expected a cuda device, but got: cpu Beginners	0	334	April 17, 2024
Inference mistral-7b instruct fully offline in Local machin Beginners	0	465	April 27, 2024
torch.nn.DataParallel Mistral-7B-Instruct RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! Beginners	1	64	August 20, 2024

Run Mistral model only on CPU

Related topics