Issues with Dataset Loading and Checkpoint Saving using FSDP with HuggingFace Trainer on SLURM Multi-Node Setup

It seems to be an unsolved issue, but there may be a way to avoid it (such as downgrading the accelerate library).

1 Like