I am trying to fine-tune the BART model checkpoints on a large dataset(around 1M data points). Since the dataset is large, I want to utilize a multi-GPU setup but I see that because of this line it’s not currently possible to train in a multi-gpu setting. Any work arounds for it?
@sshleifer Tagging you here since you’ve worked with BART and summarization in particular a lot on the repo.
Which task are you finetuning on?
For sequence to sequence tasks, like summarization,
examples/seq2seq/finetune.py supports multigpu for training only. There is a caveat: you have to run the final eval yourself on one GPU.
For language modeling tasks, multi-gpu is supported through the
Thanks for the reply. It’s a seq2seq task but wouldn’t the assert condition fail during training if I specify multiple GPUs in the training command? Do you mean I can comment out that part and then run the script?
I am okay with the caveat.
you won’t need to comment that line, just set
sortish_sampler argument to
False, anyway, it’s
False by default so you won’t need to change anything.