[RuntimeError] GPU is required to quantize or run quantize model – Qwen1.5-0.5B-Chat in my Space

funme · May 23, 2025, 3:47pm

Hello everyone😊,
I’d like to test the model on the free CPU environment—do you have any suggestions?

I’m encountering an error when trying to deploy the Qwen1.5-0.5B-Chat model in my Hugging Face Space running on CPU-only (free) .

MyQwen1.5 0.5B Chat - a Hugging Face Space by funme

Thank you
Here the full log: tokenizer_config.json: 0%| | 0.00/1.29k [00:00<?, ?B/s]
tokenizer_config.json: 100%|██████████| 1.29k/1.29k [00:00<00:00, 7.24MB/s]
vocab.json: 0%| | 0.00/2.78M [00:00<?, ?B/s]
vocab.json: 100%|██████████| 2.78M/2.78M [00:00<00:00, 27.1MB/s]
merges.txt: 0%| | 0.00/1.67M [00:00<?, ?B/s]
merges.txt: 100%|██████████| 1.67M/1.67M [00:00<00:00, 31.1MB/s]
tokenizer.json: 0%| | 0.00/7.03M [00:00<?, ?B/s]
tokenizer.json: 100%|██████████| 7.03M/7.03M [00:00<00:00, 58.3MB/s]
config.json: 0%| | 0.00/1.26k [00:00<?, ?B/s]
config.json: 100%|██████████| 1.26k/1.26k [00:00<00:00, 7.28MB/s]
Traceback (most recent call last):
File “/home/user/app/app.py”, line 9, in
model = AutoModelForCausalLM.from_pretrained(
File “/usr/local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py”, line 571, in from_pretrained
return model_class.from_pretrained(
File “/usr/local/lib/python3.10/site-packages/transformers/modeling_utils.py”, line 309, in _wrapper
return func(*args, **kwargs)
File “/usr/local/lib/python3.10/site-packages/transformers/modeling_utils.py”, line 4389, in from_pretrained
hf_quantizer.validate_environment(
File “/usr/local/lib/python3.10/site-packages/transformers/quantizers/quantizer_gptq.py”, line 65, in validate_environment
raise RuntimeError(“GPU is required to quantize or run quantize model.”)
RuntimeError: GPU is required to quantize or run quantize model.

John6666 · May 23, 2025, 3:57pm

It may be possible to use a quantized model in a CPU environment, but it would probably be faster to simply use a non-quantized model in this case.

#MODEL_ID = "Qwen/Qwen1.5-0.5B-Chat-GPTQ-Int4"
MODEL_ID = "Qwen/Qwen1.5-0.5B-Chat"

funme · May 23, 2025, 4:04pm

Thank you😊 , I need a model size smaller than 700 MB , I’m going to change model, if I can’t use this model

system · May 24, 2025, 4:05am

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Loading quantized model on CPU only 🤗Transformers	6	18514	February 3, 2025
Unable to run quantized model with Zero GPU space Spaces	0	102	June 11, 2024
How to load quantized LLM to CPU only device Intermediate	0	1939	January 28, 2024
GPTQ and AWQ quantized model doesn't work Beginners	0	143	February 19, 2024
Using gpt-j-6B in a CPU space without the InferenceAPI Spaces	0	2280	January 28, 2022

[RuntimeError] GPU is required to quantize or run quantize model – Qwen1.5-0.5B-Chat in my Space

Related topics