How to use FSDP + DPP in Trainer

maxBing12345 · March 17, 2023, 11:16pm

Hi - I want to train a model with [e.g. 256 GPU]. I want to have 4 data parallelism (DDP) to replicate the full model, and in each parallelism use FSDP to shard the model into 64 GPUs. Any code example?

I know how to write it in a native Pytorch but how to do this in Trainer. Is it supportive?

Atharva · April 24, 2023, 11:38pm

Did you figure it out?

Topic		Replies	Views
What algorithm Trainer uses for multi GPU training (without torchrun) Beginners	1	910	January 19, 2023
Trainer API for Model Parallelism on Multiple GPUs 🤗Transformers	5	4148	September 10, 2024
How to use FSDP or DDP with Seq2SeqTrainer? 🤗Transformers	0	978	May 22, 2023
How to do model.generate() in evaluation steps with Trainer + FSDP? 🤗Transformers	4	2904	October 8, 2024
Which data parallel does trainer use? DP or DDP? 🤗Transformers	2	6350	August 17, 2022

How to use FSDP + DPP in Trainer

Related topics