Multi-gpu huggingface training using trl

I am trying to fine-tune llama on multiple GPU using trl library, and trying to achieve data-parallel and model-parallel both. While training using model-parallel, I noticed that gpu:0 is actively computing, while other GPUs set idle despite their VRAM are consumed. I feel like this is an unexpected act, expecting all GPUs would be busy during training. I am running the model on notebook.

I have raised a ticket in here

1 Like

It sounds like the model-parallel setup isn’t fully utilizing all GPUs. When only gpu:0 is active, it usually means the workload isn’t evenly split. Check if the model layers are properly assigned across GPUs using device_map. Also, notebooks can limit multi-GPU performance—consider running your script with accelerate launch or torchrun for better parallelism.

1 Like