CPU generate is only using 15% cpu (LLaMA 13B)

My code looks like this.

import torch
from transformers import LlamaTokenizer, LlamaForCausalLM
tokenizer = LlamaTokenizer.from_pretrained("/path/to/model")
model = LlamaForCausalLM.from_pretrained("/path/to/model")
prompt="prompt text"
inputs = tokenizer(prompt, return_tensors="pt")
generate_ids = model.generate(inputs.input_ids, max_length=1500, temperature=0.7, do_sample=True)
tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]

For about 15 seconds it uses 50% cpu then it uses 15% cpu until its done generating.
I would like to use 100% cpu as this would be about 6x faster.

I tried googling this problem but all I could find was people trying to use the cpu instead of the gpu or people trying to run on a specific number of cpu cores/threads.

In case its relevant:
I’m running linux mint 20, this machine has nothing installed except for transformers and jupyter lab, I installed transformers in a venv, I’m using pytorch.

1 Like