Hi, I am using a LLM model, CohereForAI/c4ai-command-r-plus-4bit
, to do some inference. I have a GPU but it’s not powerful enough so I want to use CPU. Below are the example codes and problems.
Code:
from transformers import AutoTokenizer, AutoModel, AutoModelForCausalLM
PRETRAIN_MODEL = 'CohereForAI/c4ai-command-r-plus-4bit'
tokenizer = AutoTokenizer.from_pretrained(PRETRAIN_MODEL)
model = AutoModelForCausalLM.from_pretrained(PRETRAIN_MODEL, device_map='cpu')
text = "this is an example"
inputs = tokenizer(text, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs)
embedding = outputs.last_hidden_state.mean(dim=1).squeeze().numpy()
print(embedding.shape)
Error:
ValueError: Expected a cuda device, but got: CPU
Transformer version information:
transformers
version: 4.40.0.dev0- Platform: Linux-5.4.0-150-generic-x86_64-with-glibc2.27
- Python version: 3.11.8
- Huggingface_hub version: 0.20.3
- Safetensors version: 0.4.2
- Accelerate version: 0.29.2
- Accelerate config: not found
- PyTorch version (GPU?): 2.2.2 (True)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?: Not
- Using distributed or parallel set-up in script?: Not
Does it mean that the c4ai-command-r-plus-4bit
model can only run on GPU? Is there anything I missed to run it on CPU? Thanks!