Multi GPU fintuning BART

I am trying to fine-tune the BART model checkpoints on a large dataset(around 1M data points). Since the dataset is large, I want to utilize a multi-GPU setup but I see that because of this line it’s not currently possible to train in a multi-gpu setting. Any work arounds for it?

@sshleifer Tagging you here since you’ve worked with BART and summarization in particular a lot on the repo. :slight_smile:

Which task are you finetuning on?

For sequence to sequence tasks, like summarization, examples/seq2seq/ supports multigpu for training only. There is a caveat: you have to run the final eval yourself on one GPU.

For language modeling tasks, multi-gpu is supported through the Trainer class.

Thanks for the reply. It’s a seq2seq task but wouldn’t the assert condition fail during training if I specify multiple GPUs in the training command? Do you mean I can comment out that part and then run the script?
I am okay with the caveat.

hi @dakshvar22
you won’t need to comment that line, just set sortish_sampler argument to False, anyway, it’s False by default so you won’t need to change anything.

1 Like