Can we use Gradient Checkpointing and Gradient Accumulation at Once?

Hi I’m trying to train large batch size for my model,

So can I use Gradient Checkpointing and Gradient Accumulation at once?

I’m not sure that gradient would safely added when checkpointing is done

P.S : would it be okay to use multi-GPU + Gradient Checkpointing + Gradient Accumulation at Once?

Yes, those two techniques can be used together, and also with distributed training as well.