Hi,
I am trying to fine-tune the BART model checkpoints on a large dataset(around 1M data points). Since the dataset is large, I want to utilize a multi-GPU setup but I see that because of this line it’s not currently possible to train in a multi-gpu setting. Any work arounds for it?
@sshleifer Tagging you here since you’ve worked with BART and summarization in particular a lot on the repo. 
Which task are you finetuning on?
For sequence to sequence tasks, like summarization, examples/seq2seq/finetune.py
supports multigpu for training only. There is a caveat: you have to run the final eval yourself on one GPU.
For language modeling tasks, multi-gpu is supported through the Trainer
class.
Thanks for the reply. It’s a seq2seq task but wouldn’t the assert condition fail during training if I specify multiple GPUs in the training command? Do you mean I can comment out that part and then run the script?
I am okay with the caveat.
hi @dakshvar22
you won’t need to comment that line, just set sortish_sampler
argument to False
, anyway, it’s False
by default so you won’t need to change anything.
1 Like