Bigbird-roberta batch size

calderma · May 30, 2021, 3:09pm

Hello I’m wondering about the expected batch size possible with this model. When I run bigbird-roberta with fp16 sequence length of 2048 and using sharded ddp I am only able to get a batch size of 1. I am running on 8 32GB GPUs. My train set isn’t that large (200k docs) but is the problem that it’s trying to load the whole train set onto the GPU at the same time? According to the paper they were able to do the pretraining with a significantly larger batch size. Is there anything I can do beyond my current setup? Thank you for your help.

Topic		Replies	Views
Out of Memory training google/big-bird-roberta-base Models	0	879	December 22, 2021
Why are huge batch sizes used for pretraining and small ones for finetuning? Research	3	10282	January 10, 2023
RoBERTa training low GPU utilization 🤗Transformers	6	4015	July 3, 2021
Bigbird pretraining Beginners	3	885	March 16, 2022
CUDA out of memory 🤗Transformers	2	536	July 16, 2022

Bigbird-roberta batch size

Related topics