What is cause and solution to Trainer error: cuda RuntimeError 711?

I’m trying to fine-tune DistilBert using Trainer, following the HuggingFace tutorial:

When I try running the example, I get the following error:

RuntimeError: cuda runtime error (711) : peer mapping resources exhausted at /pytorch/aten/src/THC/THCGeneral.cpp:139

What does this error mean and how do I fix it?

Can you run your code on CPU, to get a more informative error message?

How do I specify in either the TrainingArguments or the Trainer to run on GPU?

You can just run you script with "CUDA_VISIBLE_DEVICES="" before, to hide the GPUs

I’m rerunning and no error emerges, but it’s as slow as mud :confused: