I was wondering if there’s a way to backpropagate the accumulation of loss through multiple optimizer steps while using Trainer API? Since it’s easy to get cuda out of memory and would like to avoid it.
If you are talking about gradient accumulation, you can set it with gradient_accumulation_steps=xxx
in your TrainingArguments
.