Custom trainer does not work on multiple GPUs

I’m not an expert in hugging face, but check self.teacher_model.to(input_ids.device), this is explicitly moving the model to a single device, ‘cuda’ will move it to gpu:0, which is not what you want.

Lemme know if removing it works