Unable to run quantized model with Zero GPU space

Hi guys! I am trying to deploy a 4-bit llm model in a Zero GPU space, and encountered problems when load the pretrained model outside the method with @spaces.GPU decoration.

I posted a discussion on zero-gpu org, is anybody can help me with this issue?

Zero GPU does not support 4-bit quantization with bitsandbytes?