I have a training script that takes the training arguments, creates a directory for the experiment run, processes the annotations from the files passed and trains a DETR model. My dataset class is custom and inherits the torch Dataset class. These things are handled from a main script which is the entrypoint.
I moved this configuration to use accelerate to train on multiple GPUs. I followed this tutorial and changed the relevant parts. I started the training with nohup python3 main.py --flags... &
. This uses only 1 GPU out of 4 and printing accelerator.num_processes
returns 1.
I tried running with nohup accelerate launch main.py --flags... &
after running accelerate config
and setting appropriate params. This created 4 experiment runs/directories which is not desirable.
My training loop is simple pytorch based loop and I am not using Trainer/torch lightning. What am I doing wrong? Is there a standard practice that I should follow?