Finetuning ByT5 with a batch size of 1 on T4 GPU

lersouza · June 30, 2022, 3:20pm

Hey guys!

I am working with the ByT5 model and TydiQA.

Specifically, I am finetuning a ByT5 model into subsets of Tydi-GoldP for a specific language. I first started working with PyTorch in an A100 GPU (with 80 GB), but now I am moving to a T4 GPU with 16GB approx.

In the A100 GPU, I used an input seq. length of 2048, the output was 768, a Learning rate of 1e-4 (Adafactor), batch size was 6, and grad. accumulation of 4 steps.

For moving to T4, I switched to a batch size of 1 with an accumulation of 24, so I keep the original equivalent batch size.

However, the results are far distant. In the A100 setup, I got a 60 F1 score for the English subset, but in the T4 I cannot surpass the mark of a 30 F1 Score. The loss in A100 is about 0.01, but for T4 I got a mean loss of 0.2 ˜ 03.

Do you guys have any tips or comments about that? Didn’t find any paper or results about different results or poor performance with a batch size of 1.

Thank you!

Topic		Replies	Views
Replicating SQuAD results on T5 Models	2	683	January 17, 2023
Trainer with adaptive batch size? Beginners	0	1035	September 29, 2023
Does T5 truncate input longer than 512 internally? 🤗Transformers	2	12309	February 12, 2021
T5 inference performance Models	5	1564	March 8, 2022
RAG batch size on GPU Beginners	0	640	March 2, 2021

Finetuning ByT5 with a batch size of 1 on T4 GPU

Related topics