Multiple gpu training

roman174 · August 4, 2024, 10:25pm

Can I please ask if it’s possible to do multi gpu training if the whole model itself doesn’t fit on one gpu when loaded? For example, I’m training using the Trainer from huggingface Llama3.1 8b in full precision on 4 gpus of 16 GB VRAM each. The model takes up about 32GB when loaded, so each graphic is taken up to about 8GB (8*4).
When I run the training, the number of steps equals = (dataset length) * (number of epochs) / (batch size). If it were distributed, it’s additionally divided by the number of graphics cards.
So is this even possible with this computational power and if so, is there any way to integrate this in Trainer from huggingface? I’m doing the code in jupyternotebook and even if I load model that fits on single gpu, the training never start distributed.

raulc0399 · August 10, 2024, 8:56am

you can take a look at Efficient Training on Multiple GPUs
if i understood your context, you are looking at Case 2: Your model doesn’t fit onto a single GPU:

Topic		Replies	Views
Trainer) training one batch with multiple GPUs DeepSpeed	0	394	June 19, 2023
Cannot launch multi-gpu training? 🤗Transformers	0	718	September 14, 2023
Model Parallelism and Pipelining for Model Training Beginners	3	3332	April 11, 2024
Trainer.train() hangs with multiple GPUs (but GPUs show activity) Beginners	4	837	October 31, 2024
Fine tunning llama2 with multiple GPUs and Hugging face trainer 🤗Transformers	1	3478	November 3, 2023

Multiple gpu training

Related topics