How can I get advantage using multi-GPUs

You’d probably also need to use distributed barriers in your code for things like downloading a pretrained model only once. You may also need to gather the results to a single process and do some processing or evaluation. I advise you to go through the code for a better understanding. But there is not one specific implementation necessary for transformers-specific multi-GPUs. Yoy can just use transformers models like any other pytorch/tensorflow model.

1 Like