Trainer) training one batch with multiple GPUs

jeesoo · June 19, 2023, 2:24am

Hi, there.
I’m using huggingFace Trainer code to train gpt-based large language model.
The size is more than 8b.
when I use input sequence length = 2048 tokens, and the per_device_train_batch_size=1, it seems it doesn’t fit on A100 (40GB) GPU. How can I load one batch to multiple gpus? It seems like that I ‘must’ load more than one batch on one gpu.

Topic		Replies	Views
Multiple gpu training 🤗Transformers	1	2242	August 10, 2024
Using Huggingface-Trainer with 2 GPUs (Endless Loop) Beginners	0	293	October 10, 2023
About pre-training the bert-base-cased model Models	0	187	October 17, 2023
Multi-gpu batch processing fails when using Peft Lora with Huggingface Intermediate	1	1286	March 8, 2024
Huggingface Distributed Training with Accelerate Beginners	1	864	May 11, 2023

Trainer) training one batch with multiple GPUs

Related topics