Hugging Face Forums
Offloading LLM models to CPU uses only single core
🤗Transformers
rhwauiy89
June 3, 2024, 6:13am
2
I’m also having same problem with Mistral 7B, you can try BetterTransformer
show post in topic
Related topics
Topic
Replies
Views
Activity
How to make model.generate() process using multiple CPU cores?
🤗Transformers
2
298
February 10, 2025
Accelerate not spreading on multiple CPUs
🤗Accelerate
1
1818
August 1, 2023
Big Model Inference: CPU/Disk Offloading for Transformers Using from_pretrained
🤗Accelerate
2
4981
February 28, 2024
Inference with hugging face pipeline happening on CPU, even if model is loaded on GPU
🤗Transformers
0
1719
May 30, 2023
`text-generation` `Pipeline` prohibitively slow to load, even with cached model
🤗Transformers
1
4427
May 23, 2023