Colab session crashing after using all available RAM

I am trying to follow the Hugging Face article " How to train a new language model from scratch using Transformers and Tokenizers". But colab session get crashed after using all available RAM. It happens when running the function which build the training Dataset.

I am using a Sinhala language dataset for this. Size of the dataset is about 250MB. I am using it through google drive.

This is the link of the Colab notebook :

Please go through this and tell me what I am missing here ?
Thanks in advance !!