The dataset is fairly large (1 million triplets) and I’m running into memory issues in Google Colab. What is the best alternative here?
So far I can think of these three, but I’m not sure which is the best:
- Fine-tune using a streaming dataset. Is this possible?
- Fine-tune using a smaller subset of the dataset. But I’m running into the same memory issues using even just 1% of the dataset…
- Pay for Google Colab Pro. But I’m not sure this will be enough.