I’m trying to fine-tune this Sentence Transformers model from the Hub on the Portuguese subset of this dataset.
The dataset is fairly large (1 million triplets) and I’m running into memory issues in Google Colab. What is the best alternative here?
So far I can think of these three, but I’m not sure which is the best:
- Fine-tune using a streaming dataset. Is this possible?
- Fine-tune using a smaller subset of the dataset. But I’m running into the same memory issues using even just 1% of the dataset…
- Pay for Google Colab Pro. But I’m not sure this will be enough.
Any suggestions?