I trained a 90M parameter embedding model from scratch

I trained a 90M parameter encoder only (embedding) model from scratch. I mostly trained in on google colab on a colab pro plus subscription. this was like the 5th run as previously I had issues with exploding gradients.


It was a fun project but not yet near SOTA quality. I also managed to successfully infer it with Auto model. it uses e5-base-v2 tokeniser.

It was distillation based training from CohereLabs/wikipedia-2023-11-embed-multilingual-v3-int8-binary · Datasets at Hugging Face . Contrastive training would have likely taken more time. 5000 learning steps starting from 1e-5 lr.

50k total steps. General model health check and checkpointing every additional 5k steps until 50k steps.

I evaluated it on STS benchmark.


Spearman Correlation: 0.5453


If anyone would like to try the model. The huggingface page of the model is - pranavupadhyaya52/rocky-embed · Hugging Face

1 Like