Grad Accumulation in FSDP

xiongyingtong · December 26, 2024, 3:00am

When I use transformers to train model, and enable FSDP and grad accumulation, however, I found that there is no reduce-scatter in backward util the last grad accumulation step. Does anybody know why?

nielsr · December 26, 2024, 8:57am

Pinging @muellerzr here

Topic		Replies	Views
Errors when using gradient accumulation with FSDP + PEFT LoRA + SFTTrainer 🤗Accelerate	2	1158	February 6, 2025
Accelerate FSDP training \|\| RuntimeError : Forward oder differ across ranks 🤗Accelerate	0	472	December 19, 2023
Not seeing memory benefit to accelerate/FSDP2 🤗Accelerate	3	56	June 18, 2025
Is there a way to backpropagate through multiple steps while using Trainer API 🤗Transformers	1	250	July 9, 2021
FSDP FULL_SHARD: 3GPUs works, 2GPUs hangs at 1st step 🤗Accelerate	0	72	August 26, 2024

Grad Accumulation in FSDP

Related topics