With a single GPU, I get reasonable outputs. Output single GPU: âhi there, Iâm a newbie to this forum and Iâm looking for some helpâ
As soon as I use multiple GPUs, I get: Output multi GPU: âhi there header driv EUannotuta voor measurements shooting variableslowea grayĹbestÄŻbindingâ
There seems to be some deeper problem (it appears as if it has to do with some interaction of the hardware and the drivers and the latest version of transformers/tokenizers). We got in contact with NVIDIA about this.
Since it has only indirectly to do with transformers, this can be closed.
For me, it was an issue of NCCL in the end. We had to deactivate ACS on the HPC on which I was working and the problem was resolved (see: Troubleshooting â NCCL 2.19.3 documentation).
It interfered with the communication between the GPUs.
@Dragon777 : Is the general setup somehow different in both cases? If the eight GPUs are on different nodes of your HPC and the 4 GPUs in the first case are not, I could imagine that something is going wrong with the inter-node communication. I think the NCCL performance test is a good tool for diagnosing the problem: GitHub - NVIDIA/nccl-tests: NCCL Tests
later on we switched to the instruct version of mistral (mistral-7b-instruct-v0.2), and then these settings had to be removed, but perhaps playing with these options will help!