Hi guys! I am trying to deploy a 4-bit llm model in a Zero GPU space, and encountered problems when load the pretrained model outside the method with @spaces.GPU decoration.
I posted a discussion on zero-gpu org, is anybody can help me with this issue?
Zero GPU does not support 4-bit quantization with bitsandbytes?