Multi-GPU inference with LLM produces gibberish

There seems to be some deeper problem (it appears as if it has to do with some interaction of the hardware and the drivers and the latest version of transformers/tokenizers). We got in contact with NVIDIA about this.
Since it has only indirectly to do with transformers, this can be closed.