CUDA out of memory on Nvidia A10G + Codellama on HuggingFace Spaces

CKeibel · February 8, 2024, 7:48am

Mhh never used ChatUI.
If I see correctly, you can specify the model, right? Then you could at least try to load the 7b model. I don’t think you can load it with lower precision, because the model parameters look like they are for the generation. But I’m not sure about that.

Topic		Replies	Views
Error when quantization codellama 70b Models	3	133	June 20, 2024
Colab CUDA OOM using Llama-2-7b-chat-hf even with 40GPU RAM 🤗Transformers	0	925	December 29, 2023
torch.cuda.OutOfMemoryError for CodeLlama models in H100 single GPU inference Models	2	526	March 21, 2024
codellama/CodeLlama-70b-Instruct-hf TGI server out-of-memory error in H100 Models	2	299	March 22, 2024
CUDA out of memory on multi-GPU 🤗Transformers	1	2697	March 6, 2024

CUDA out of memory on Nvidia A10G + Codellama on HuggingFace Spaces

Related topics