Roberta-base takes too long

LearnToGrow · August 3, 2022, 8:44pm

Hello,
I am fine-tuning roberta-base using this script on ~8M example for classification. I am using NVIDIA RTX A6000 with 6 gpus. Max length is 64, batch_size=32 gradient_accumulation_steps=4. 1 epochs takes ~8 hours. Is it in the normal? I see that the original version was trained 1024 gpus.

Topic		Replies	Views
RoBERTa training low GPU utilization 🤗Transformers	6	4008	July 3, 2021
Not getting substantial training time improvement with LORA - is this expected? 🤗Transformers	1	674	October 7, 2024
RuntimeError: CUDA out of memory. Tried to allocate 384.00 MiB (GPU 0; 11.17 GiB total capacity; 10.62 GiB already allocated; 145.81 MiB free; 10.66 GiB reserved in total by PyTorch) Beginners	8	27412	December 10, 2023
Bigbird-roberta batch size Beginners	0	494	May 30, 2021
CUDA out of memory 🤗Transformers	2	535	July 16, 2022

Roberta-base takes too long

Related topics