I have a script that uses HF Trainer and works fine when I run it.
But if I run the command for multi-gpu training torchrun --nproc_per_node 4 my_script.py
I get an error:
[rank1]: Traceback (most recent call last):
[rank1]: File "/home/jpiabrantes/rosetta/fine_tune_coder.py", line 128, in <module>
[rank1]: main()
[rank1]: File "/home/jpiabrantes/rosetta/fine_tune_coder.py", line 103, in main
[rank1]: training_args = TrainingArguments(
[rank1]: File "<string>", line 127, in __init__
[rank1]: File "/home/jpiabrantes/rosetta/.venv/lib/python3.10/site-packages/transformers/training_args.py", line 1630, in __post_init__
[rank1]: and (self.device.type == "cpu" and not is_torch_greater_or_equal_than_2_3)
[rank1]: File "/home/jpiabrantes/rosetta/.venv/lib/python3.10/site-packages/transformers/training_args.py", line 2131, in device
[rank1]: return self._setup_devices
[rank1]: File "/home/jpiabrantes/rosetta/.venv/lib/python3.10/site-packages/transformers/utils/generic.py", line 59, in __get__
[rank1]: cached = self.fget(obj)
[rank1]: File "/home/jpiabrantes/rosetta/.venv/lib/python3.10/site-packages/transformers/training_args.py", line 2063, in _setup_devices
[rank1]: self.distributed_state = PartialState(
[rank1]: File "/home/jpiabrantes/rosetta/.venv/lib/python3.10/site-packages/accelerate/state.py", line 278, in __init__
[rank1]: self.set_device()
[rank1]: File "/home/jpiabrantes/rosetta/.venv/lib/python3.10/site-packages/accelerate/state.py", line 786, in set_device
[rank1]: torch.cuda.set_device(self.device)
[rank1]: File "/home/jpiabrantes/rosetta/.venv/lib/python3.10/site-packages/torch/cuda/__init__.py", line 399, in set_device
[rank1]: torch._C._cuda_setDevice(device)
[rank1]: RuntimeError: CUDA error: invalid device ordinal
[rank1]: CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
[rank1]: For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
[rank1]: Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
[rank2]: Traceback (most recent call last):
[rank2]: File "/home/jpiabrantes/rosetta/fine_tune_coder.py", line 128, in <module>
[rank2]: main()
[rank2]: File "/home/jpiabrantes/rosetta/fine_tune_coder.py", line 103, in main
[rank2]: training_args = TrainingArguments(
[rank2]: File "<string>", line 127, in __init__
[rank2]: File "/home/jpiabrantes/rosetta/.venv/lib/python3.10/site-packages/transformers/training_args.py", line 1630, in __post_init__
[rank2]: and (self.device.type == "cpu" and not is_torch_greater_or_equal_than_2_3)
[rank2]: File "/home/jpiabrantes/rosetta/.venv/lib/python3.10/site-packages/transformers/training_args.py", line 2131, in device
[rank2]: return self._setup_devices
[rank2]: File "/home/jpiabrantes/rosetta/.venv/lib/python3.10/site-packages/transformers/utils/generic.py", line 59, in __get__
[rank2]: cached = self.fget(obj)
[rank2]: File "/home/jpiabrantes/rosetta/.venv/lib/python3.10/site-packages/transformers/training_args.py", line 2063, in _setup_devices
[rank2]: self.distributed_state = PartialState(
[rank2]: File "/home/jpiabrantes/rosetta/.venv/lib/python3.10/site-packages/accelerate/state.py", line 278, in __init__
[rank2]: self.set_device()
[rank2]: File "/home/jpiabrantes/rosetta/.venv/lib/python3.10/site-packages/accelerate/state.py", line 786, in set_device
[rank2]: torch.cuda.set_device(self.device)
[rank2]: File "/home/jpiabrantes/rosetta/.venv/lib/python3.10/site-packages/torch/cuda/__init__.py", line 399, in set_device
[rank2]: torch._C._cuda_setDevice(device)
[rank2]: RuntimeError: CUDA error: invalid device ordinal
[rank2]: CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
[rank2]: For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
[rank2]: Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.