Saving unique weights while training on multiple GPU - Trainer

Hello,

I am training LoRA adaptation of a T5 model in a one-machine multiple GPU setup.
I am using Transformers 4.26.1 and DeepSpeed 0.9.2 and launching my script with deepspeed (thus the parallelization setup is Distributed Data Parallel).

I am using a customized callback in the Trainer to save only the LoRA weights at each epoch. Unfortunately, as I am using multiple GPUs, the script is ran on parallel four times (the numbers of GPU I use). Making it that at the end of each epoch, the weights are saved four times. Is there anything I can do to only save the weights once?

Thanks for your help,

Lucius