The problem on syncing across all processes when I use accelerate cli with 'multi_gpu' to run DDP for my codes without using accelerator.print

I got the following results, so how should I synchronize these outputs without using an accelerator.print(). Also, when I run my code using DDP in Pytorch, everything works fine.

Epoch(train) [1]/[1000] [1/607]  lr: 2.000e-05, eta: 32 days, 11:04:36, time: 4.621, data_time: 1.748, memory: 10783MB, loss: 1.83806, grad_norm: 20.63219
Epoch(train) [1]/[1000] [1/607]  lr: 2.000e-05, eta: 43 days, 0:00:09, time: 6.121, data_time: 1.765, memory: 10783MB, loss: 1.82934, grad_norm: 20.23274
Epoch(train) [1]/[1000] [2/607]  lr: 2.000e-05, eta: 28 days, 8:44:18, time: 3.454, data_time: 0.880, memory: 12096MB, loss: 1.50286, grad_norm: 20.35895
Epoch(train) [1]/[1000] [3/607]  lr: 2.000e-05, eta: 25 days, 22:37:27, time: 3.003, data_time: 0.588, memory: 12096MB, loss: 1.28645, grad_norm: 19.10999
Epoch(train) [1]/[1000] [2/607]  lr: 2.000e-05, eta: 38 days, 18:26:38, time: 4.916, data_time: 0.890, memory: 12096MB, loss: 1.48691, grad_norm: 20.24583
Epoch(train) [1]/[1000] [4/607]  lr: 2.000e-05, eta: 24 days, 9:15:53, time: 2.806, data_time: 0.443, memory: 12096MB, loss: 1.14824, grad_norm: 18.33704
Epoch(train) [1]/[1000] [5/607]  lr: 2.000e-05, eta: 23 days, 2:50:05, time: 2.569, data_time: 0.354, memory: 12096MB, loss: 1.03992, grad_norm: 17.41260
Epoch(train) [1]/[1000] [3/607]  lr: 2.000e-05, eta: 35 days, 22:56:00, time: 4.317, data_time: 0.597, memory: 12096MB, loss: 1.26814, grad_norm: 18.32812
Epoch(train) [1]/[1000] [6/607]  lr: 2.000e-05, eta: 22 days, 4:54:31, time: 2.510, data_time: 0.296, memory: 12096MB, loss: 0.95433, grad_norm: 16.64126