Today I was using the DPOTrainer from the trl library for DPO, and since I wanted to utilize multi-GPU training, I configured it with Accelerate. The basic code structure is similar to the DPO example provided under the llama model in the trl library.
This has raised some questions for me, and I hope someone can help clarify:
- How should I combine DPOTrainer and Accelerate for training? When using the DPOTrainer class in the code, do I simply need to use
accelerate launch
at startup? - I’ve reviewed the source code for DPOTrainer and Trainer, and I noticed that when passing the dataset to DPOTrainer, it does not seem to partition the dataset. Does this mean that for each step, every GPU is loading the same data? Is training supposed to work this way?
Could someone explain how DPOTrainer handles multi-GPU training, or clarify if my understanding is correct? Any insights would be greatly appreciated.