Multi GPU Build Possible?

realshyfox · January 19, 2025, 11:33pm

I understand you want to inference a 32B?
You can do offloading. Here are the numbers.

FP32:
32 billion parameters * 4 bytes/parameter = 128 billion bytes
128 billion bytes / (1024 * 1024 * 1024) = 120 GB

FP16:
32 billion parameters * 2 bytes/parameter = 64 billion bytes
64 billion bytes / (1024 * 1024 * 1024) = 60 GB

Note:
This calculation only considers the memory required to store the model parameters themselves.
In reality, you’ll also need memory for:

Activations during inference
Optimizer states (if training)
Intermediate calculations
System overhead
This means the actual memory requirements will be significantly higher.

Topic		Replies	Views
Model Parallelism and Pipelining for Model Training Beginners	3	3479	April 11, 2024
TGI - use both GPU and CPU Beginners	1	81	April 19, 2025
Hardware suggestions Beginners	1	28	June 6, 2025
Multiple gpu training 🤗Transformers	1	2699	August 10, 2024
How to load large model with multiple GPU cards? Beginners	8	44537	October 25, 2023