I am performing some tests with Accelerate on an HPC (where slurm is usually how we distribute computation). It works on one node and multiple GPU but now I want to try a multi node setup.
I will use your launcher
accelerate launch --config_file <config-file> <my script> but then I need to be able to update a couple of the fields from the json file in my script (so during the creation of the Accelerator ?) :
How can I do that ? Will it be working ?
I am right to think that if my setup is two nodes, each one with 4 GPU, the (range of) value(s) should be:
- for “num_process”: 8 (the number of gpu)
- for “num_machine”: 2
- for “machine rank”: [0,1]
- for “distributed_type” : “MULTI_GPU”