Run pre-trained LLM model on CPU - ValueError: Expected a cuda device, but got: cpu

Hi, I am using a LLM model, CohereForAI/c4ai-command-r-plus-4bit, to do some inference. I have a GPU but it’s not powerful enough so I want to use CPU. Below are the example codes and problems.

Code:

from transformers import AutoTokenizer, AutoModel, AutoModelForCausalLM

PRETRAIN_MODEL = 'CohereForAI/c4ai-command-r-plus-4bit'
tokenizer = AutoTokenizer.from_pretrained(PRETRAIN_MODEL)
model = AutoModelForCausalLM.from_pretrained(PRETRAIN_MODEL, device_map='cpu')

text = "this is an example"
inputs = tokenizer(text, return_tensors="pt")
with torch.no_grad():
    outputs = model(**inputs)
    embedding = outputs.last_hidden_state.mean(dim=1).squeeze().numpy()
print(embedding.shape)

Error:

ValueError: Expected a cuda device, but got: CPU

Transformer version information:

  • transformers version: 4.40.0.dev0
  • Platform: Linux-5.4.0-150-generic-x86_64-with-glibc2.27
  • Python version: 3.11.8
  • Huggingface_hub version: 0.20.3
  • Safetensors version: 0.4.2
  • Accelerate version: 0.29.2
  • Accelerate config: not found
  • PyTorch version (GPU?): 2.2.2 (True)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using GPU in script?: Not
  • Using distributed or parallel set-up in script?: Not

Does it mean that the c4ai-command-r-plus-4bit model can only run on GPU? Is there anything I missed to run it on CPU? Thanks!