Successive Matryoshka training - Healthcare concepts

Cwix101 · September 3, 2024, 11:26am

By constraining model capacity with a Matryoshka wrapper, the modeling task becomes more difficult, and this theoretically improves the capability per parameter.

We are using a cross-encoder to train a Transformer model of the task of Semantic Similarity of Health Care concepts.

Has anyone explored the potential benefit of successive Matryoshka training with an increasingly less coarse Matryoshka function ?

THe idea would be to use the initial runs to provide an improved underlying structure, for the later runs to build upon. ( more similar to the way a sculptor works ,… first chiseling out a rough form before further honing)

Thank You

Topic		Replies	Views
Calling healthcare AI devs: do you struggle with access to clinical data? 🤗Datasets	4	25	June 10, 2025
MedClip - Pretraining CLIP on medical data Flax/JAX Projects	25	4942	July 9, 2021
Knowledge Distillation of SentenceTransformer - problems making it work Beginners	0	1060	April 9, 2022
Advice on Transformer Models for EDU Segmentation and Topic/Sentiment Analysis in Hugging Face Beginners	0	384	January 12, 2024
Using Roberta for Sentence2Vec Intermediate	3	1259	April 11, 2021

Successive Matryoshka training - Healthcare concepts

Related topics