Loss spike when resuming from FSDP SHARDED_STATE_DICT checkpoint (possible optimizer-state mismatch)

4

I know the known issues regarding this.