How much VRAM and how many GPUs to fine-tune a 70B parameter model like LLaMA 3.1 locally?

Hey everyone,

I’m planning to fine-tune a 70B parameter model like LLaMA 3.1 locally. I know it needs around 280GB VRAM for the model weights alone, and more for gradients/activations. With a 16GB VRAM GPU like the RTX 5070 Ti, that would mean needing about 18 GPUs to handle it.

At $600 per GPU, that’s around $10,800 just for the GPUs.

Does that sound right, or am I missing something? Would love to hear from anyone who’s worked with large models like this!

1 Like

That’s correct…

However, by using 4-bit quantization, the VRAM size required for the model can be reduced to under 50GB. Additionally, using PEFT can further save VRAM and computational resources. There are also third-party trainers available that are both VRAM-efficient and fast.

For hardware purchase consultations, I recommend the Hugging Face Discord server.
Experienced users who have actually purchased hardware often visit there…
I only have a mid-range GPU myself.

1 Like