Hi I’m trying to train large batch size for my model,
So can I use Gradient Checkpointing and Gradient Accumulation at once?
I’m not sure that gradient would safely added when checkpointing is done
P.S : would it be okay to use multi-GPU + Gradient Checkpointing + Gradient Accumulation at Once?