How to reduce memory usage for inference while training models from scratch?

tlqnguyen · January 30, 2021, 8:28am

I am training BERT models from scratch with my own custom dataset/vocabulary for a scientific domain with original BERT and DistilBERT.

Upon finishing training, I did some experiments to see the memory usage for inference of my models in comparison with SciBERT but SciBERT really outperforms mine especially when the batch_size is higher and when there are more tokens.

DistilBERT actually has fewer layers than SciBERT so I would assume that it would take less memory usage but that is not the case.

Can anyone please advise me how to reduce the memory usage while training a language model?

Topic		Replies	Views
Ways to reduce memory consumption in Q&A tasks without damage (or at least, not that much) the accuracy? 🤗Transformers	0	438	October 13, 2021
Inference time gets slower as dataset size increase 🤗Transformers	0	431	February 23, 2023
Using Trainer at inference time 🤗Transformers	9	15863	May 4, 2023
Using Batch Encodings 🤗Transformers	0	684	July 12, 2022
BERT model size (transformer block number) Beginners	4	3541	August 21, 2020

How to reduce memory usage for inference while training models from scratch?

Related topics