Multi-GPU eval in PyTorch training loop with generate method

Hi,

Thank you for your work. I really like the idea behind Transformers and Accelerate libraries.

I am experimenting with TrOCR fine-tuning and currently I train it on multi-gpu, but evaluate on single-gpu using following code:

   if accelerator.is_main_process:
     unwrapped_model = accelerator.unwrap_model(model).to(accelerator.device)
     unwrapped_model.eval()
    
     valid_cer = 0.0
     with torch.no_grad():
       for batch in tqdm(eval_dataloader):
         outputs = unwrapped_model.generate(batch["pixel_values"].to(accelerator.device))
         cer = compute_cer(pred_ids=outputs, label_ids=batch["labels"])
         valid_cer += cer 

     accelerator.print("Validation CER:", valid_cer / len(eval_dataloader))

Is it possible to use generate method on a parallelized model?

Hello, yes, you can use generate method in a multi-gpu setting. Refer to the official example script transformers/run_translation_no_trainer.py at main · huggingface/transformers (github.com).