I have been trying to fine-tune a ~10.7B model with our company’s A100 80GB GPU, but have been running into memory problems. I’ve been attempting to reduce memory usage by quantizing the model with BitsAndBytesConfig (load_in_8bit = True) and lowering batch size (1), but I’m still running into the O…

Load_in_8bit vs. loading 8-bit quantized model

Meshwa May 11, 2024, 4:32am 5

Use LoRA it works with as little as a T4 15 gb, only fine-tuning the small model and then merging it with the big boi

Topic		Replies	Views
Question about memory usage Beginners	0	1042	May 15, 2023
Does loading in 4bit override an 8bit model? 🤗Transformers	0	710	October 20, 2023
Qunatized model with LORA takes much more GPU memory than the un-quantized model with LORA for the (E-5-Large Embedding Transformer) 🤗Transformers	4	1862	October 8, 2023
Can I load a model fine-tuned with LoRA 4-bit quantization as an 8-bit model? 🤗Hub	0	302	November 27, 2023
"Out of memory" when loading quantized model 🤗Accelerate	1	1475	January 22, 2024