Accelerator.save_state errors out due to timeout. Unable to increase timeout through kwargs_handlers

I am facing the same problem, trying to do accelerator.save_state() (i call it with all processes, as I have seen this is the way to do this), with FSDP wrapping. It timeouts. Have you guys found any workarounds?

1 Like