Multi-gpu huggingface training using trl

I am trying to fine-tune llama on multiple GPU using trl library, and trying to achieve data-parallel and model-parallel both. While training using model-parallel, I noticed that gpu:0 is actively computing, while other GPUs set idle despite their VRAM are consumed. I feel like this is an unexpected act, expecting all GPUs would be busy during training. I am running the model on notebook.

I have raised a ticket in here

1 Like