I’m not an expert in hugging face, but check self.teacher_model.to(input_ids.device)
, this is explicitly moving the model to a single device, ‘cuda’ will move it to gpu:0, which is not what you want.
Lemme know if removing it works
I’m not an expert in hugging face, but check self.teacher_model.to(input_ids.device)
, this is explicitly moving the model to a single device, ‘cuda’ will move it to gpu:0, which is not what you want.
Lemme know if removing it works