I have read the doc from accelerate. This doc shows how I can perform training on a single multi-gpu machine (one machine) using the “accelerate config”.
I am looking for example, how to perform training on 2 multi-gpu machines. In other words, in my setup, I have 4 x GPU per machine.
What are the packages I needs to install ? For example:
machine 1, I install accelerate & deepspeed. Run accelerate config
machine 2, do I also just install accelerate & deepspeed ?
Is the training on multi-gpu using 2 machines possible ?
Like I said, you need to run accelerate config on both machines (and yes you need to install everything you need on both of them), then run accelerate launch training_script.py on both machines as well.