Multi-GPU eval in PyTorch training loop with generate method


Thank you for your work. I really like the idea behind Transformers and Accelerate libraries.

I am experimenting with TrOCR fine-tuning and currently I train it on multi-gpu, but evaluate on single-gpu using following code:

   if accelerator.is_main_process:
     unwrapped_model = accelerator.unwrap_model(model).to(accelerator.device)
     valid_cer = 0.0
     with torch.no_grad():
       for batch in tqdm(eval_dataloader):
         outputs = unwrapped_model.generate(batch["pixel_values"].to(accelerator.device))
         cer = compute_cer(pred_ids=outputs, label_ids=batch["labels"])
         valid_cer += cer 

     accelerator.print("Validation CER:", valid_cer / len(eval_dataloader))

Is it possible to use generate method on a parallelized model?