Hi @rranjan1, could you check if the following ?
import time
from datetime import timedelta
from accelerate import Accelerator, InitProcessGroupKwargs
from torch import tensor
kwargs = [InitProcessGroupKwargs(timeout=timedelta(seconds=10))]
accelerator = Accelerator(kwargs_handlers=kwargs)
if accelerator.is_main_process:
t = tensor(0).to(accelerator.device)
time.sleep(8)
else:
t = tensor(0).to(accelerator.device)
accelerator.wait_for_everyone()
print("All called!")
It should fail since the timeout is 4. If you change it to 10, it should works. I just want to see if the timeout is indeed changed as expected.