Colab error (memory crashes)

How big is each record? How big is it after tokenization?
Are you using a data-loader? What batchsize is it using?

What happens if you you try to train using only 10 records?