Hey everyone,
I’m planning to fine-tune a 70B parameter model like LLaMA 3.1 locally. I know it needs around 280GB VRAM for the model weights alone, and more for gradients/activations. With a 16GB VRAM GPU like the RTX 5070 Ti, that would mean needing about 18 GPUs to handle it.
At $600 per GPU, that’s around $10,800 just for the GPUs.
Does that sound right, or am I missing something? Would love to hear from anyone who’s worked with large models like this!
1 Like
That’s correct…
However, by using 4-bit quantization, the VRAM size required for the model can be reduced to under 50GB. Additionally, using PEFT can further save VRAM and computational resources. There are also third-party trainers available that are both VRAM-efficient and fast.
For hardware purchase consultations, I recommend the Hugging Face Discord server.
Experienced users who have actually purchased hardware often visit there…
I only have a mid-range GPU myself.
1 Like