CUDA out of memory on Nvidia A10G + Codellama on HuggingFace Spaces

Mhh never used ChatUI.
If I see correctly, you can specify the model, right? Then you could at least try to load the 7b model. I don’t think you can load it with lower precision, because the model parameters look like they are for the generation. But I’m not sure about that.