Hello everyone😊, I’d like to test the model on the free CPU environment—do you have any suggestions? I’m encountering an error when trying to deploy the Qwen1.5-0.5B-Chat model in my Hugging Face Space running on CPU-only (free) . MyQwen1.5 0.5B Chat - a Hugging Face Space by funme Thank you …

[image] John6666: Qwen/Qwen1.5-0.5B-Chat Thank you😊 , I need a model size smaller than 700 MB , I’m going to change model, if I can’t use this model

[RuntimeError] GPU is required to quantize or run quantize model – Qwen1.5-0.5B-Chat in my Space

John6666 May 23, 2025, 3:57pm 2

It may be possible to use a quantized model in a CPU environment, but it would probably be faster to simply use a non-quantized model in this case.

#MODEL_ID = "Qwen/Qwen1.5-0.5B-Chat-GPTQ-Int4"
MODEL_ID = "Qwen/Qwen1.5-0.5B-Chat"

Topic		Replies	Views
Loading quantized model on CPU only 🤗Transformers	6	18504	February 3, 2025
Unable to run quantized model with Zero GPU space Spaces	0	102	June 11, 2024
How to load quantized LLM to CPU only device Intermediate	0	1937	January 28, 2024
GPTQ and AWQ quantized model doesn't work Beginners	0	143	February 19, 2024
Using gpt-j-6B in a CPU space without the InferenceAPI Spaces	0	2280	January 28, 2022