num_process = 2 won’t work
On the server that’s not working(only works if I set num_process = 1):
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.86.01 Driver Version: 515.86.01 CUDA Version: 11.7 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA A40 Off | 00000000:21:00.0 Off | 0 |
|ERR! 31C P0 257W / 300W | 1648MiB / 46068MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 NVIDIA A40 Off | 00000000:81:00.0 Off | 0 |
| 0% 29C P0 78W / 300W | 1664MiB / 46068MiB | 100% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 2 NVIDIA A40 Off | 00000000:E2:00.0 Off | 0 |
| 0% 32C P0 77W / 300W | 3789MiB / 46068MiB | 100% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
torch is 2.0.1, accelerate version 0.20.3
The one that’s working:
GPU is RTX 2080
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.60.13 Driver Version: 525.60.13 CUDA Version: 12.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:3B:00.0 Off | N/A |
| 22% 21C P8 7W / 215W | 0MiB / 8192MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 NVIDIA GeForce ... Off | 00000000:AF:00.0 Off | N/A |
| 22% 21C P8 16W / 215W | 0MiB / 8192MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 2 NVIDIA GeForce ... Off | 00000000:D8:00.0 Off | N/A |
| 23% 21C P8 19W / 215W | 0MiB / 8192MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
torch is 2.0.1, accelerate version 0.20.3