Unable to run quantized model with Zero GPU space

tanyuzhou · June 11, 2024, 5:44pm

Hi guys! I am trying to deploy a 4-bit llm model in a Zero GPU space, and encountered problems when load the pretrained model outside the method with @spaces.GPU decoration.

I posted a discussion on zero-gpu org, is anybody can help me with this issue?

Zero GPU does not support 4-bit quantization with bitsandbytes?

Topic		Replies	Views
Does ZeroGPU not work for all spaces? Beginners	5	1511	February 28, 2025
Load_in_8bit vs. loading 8-bit quantized model 🤗Transformers	6	6617	May 13, 2024
Inference 8 bit or 4 bit bit models on cpu? Beginners	2	3104	August 3, 2023
[RuntimeError] GPU is required to quantize or run quantize model – Qwen1.5-0.5B-Chat in my Space Beginners	3	37	May 23, 2025
How can I load an LLM in 4-bits 🤗Transformers	0	483	August 2, 2023

Unable to run quantized model with Zero GPU space

Related topics