Hugging Face Forums
Loss spike when resuming from FSDP SHARDED_STATE_DICT checkpoint (possible optimizer-state mismatch)
🤗Accelerate
John6666
June 28, 2025, 7:05am
2
4
I know the known issues regarding this.
show post in topic
Related topics
Topic
Replies
Views
Activity
Difficulty with checkpoint saving and loading (trainer+ FSDP accelerate)
Beginners
0
556
April 1, 2024
Deepspeed resume training from saved states
🤗Accelerate
0
1263
September 8, 2022
Weird behavior when saving checkpoint in DDP
🤗Accelerate
0
47
August 11, 2024
FSDP training not saving the best checkpoint and load from checkpoint fails
🤗Transformers
0
778
January 23, 2024
Eval Loss spike Seq2seq Trainer Resume from Checkpoint
🤗Transformers
0
519
June 22, 2021