Hello, I am trying to use accelerate with fastai to achieve distributed training. The SLURM system that I have access to has 4 p100 GPUs. Tue Oct 4 13:20:24 2022 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 510.47.03 Driver Version: 510.47.…

Use `accelerate` in SLURM environment

devengqc November 2, 2022, 6:35pm 8

@muellerzr The actual dataset has over 1 million for training and around 130k for validation. You can use a smaller dls instead.

I’ll also remove the wandb callback and let you know.

1 Like

Topic		Replies	Views
Issues with Dataset Loading and Checkpoint Saving using FSDP with HuggingFace Trainer on SLURM Multi-Node Setup 🤗Accelerate	1	251	April 7, 2025
Accelerate Distributed Randomly Hangs 🤗Accelerate	0	116	September 11, 2024
Cannot run multi GPU training on SLURM 🤗Accelerate	1	218	March 16, 2025
Slurm Issues running accelerate 🤗Accelerate	1	1346	November 28, 2024
Accelerate config in Seq2SeqTrainer 🤗Accelerate	0	163	June 17, 2024