HuggingFace summarization training example notebook raises two warnings when run on multi-GPUs

did this help you? Using Transformers with DistributedDataParallel — any examples?