What makes the RAM usage increase when using Trainer API?

manirai91 · August 4, 2022, 8:36pm

I am a newbie in DL and I am training my first XLM-Roberta (base) style language model using Trainer API and TPUs. In my case, the Trainer API is slightly customized to incorporate batch sampler. I started training the language model using Google colab and everything worked fine. The RAM usage was never more than 8 or 9 GB. Over time, the usage started to grow significantly. On last training, resuming from last checkpoint, it required around 55 GB of RAM while today it required 34 GB.

I don’t even know if this is normal or not. However, to my understanding this is not normal at all. The problem is I don’t even know how to troubleshoot this problem. Can anybody guide me please to solve this problem?

Topic		Replies	Views
Trainer with TPUs Beginners	3	2772	April 13, 2022
Colab RAM crash error - Fine-tuning RoBERTa in Colab Beginners	3	6490	December 15, 2020
TPU memory issues 🤗Accelerate	0	1589	May 30, 2021
Colab error (memory crashes) Beginners	3	3059	April 22, 2021
Pre-training a language model on a large dataset 🤗Transformers	5	3876	March 15, 2022

What makes the RAM usage increase when using Trainer API?

Related topics