I am using huggingface run_clm.py to train gptj-6b model with 8 gpu’s. But it is not using all gpus and throwing cuda out of memory error. I have tried changing batch_size with multiple of gpus. is there anything I have to mention for using all gpus?
this is what am currently running.
You need to launch with
torchrun --n_procs_per_node=NGPUS run_clm.py ... in order to enable multi-GPU. Another option is to use Accelerate’s CLI launcher directly:
accelerate launch --multi_gpu --num_processes=NGPUS run_clm.py...