Open-sourcing better cross-encoders for STILTS and better IR?

MoritzLaurer · October 8, 2021, 8:39am

I find your research on bi-encoders and models on sbert.net super helpful. Based on your research I understand that cross-encoders generally perform better than bi-encoders, while their main disadvantage is computational speed.

I’m very interested in deepening my research in cross-encoders, but I noticed that you’ve only published comparatively few cross-encoders here: cross-encoder (Sentence Transformers - Cross-Encoders).

My question: Could you consider to publish improved cross-encoders, either trained on your paraphrase data or the ‘all’ data from the FLAX event (‘all-mpnet…’ etc.)?

I feel like this would have great added value for the HF- and research-community, because:
- Improved cross-encoders trained on more diverse data could be great improved STILTS for sequential transfer learning applications. (see here https://arxiv.org/pdf/1811.01088.pdf)
- Your bi-encoders are probably already good STILTS, but I imagine that cross-encoders would be even better. Using these intermediate models for task-specific fine-tuning would probably be a super easy way for people to get improved performance on many tasks - just by taking your cross-encoder as the base model instead of BERT-base etc.
- Having high-performance cross-encoders would also be useful for implementing BM25 & cross-encoder reranking for information retrieval applications etc.

Could you consider to published improved cross-encoders?
(Maybe there are technical reasons why your paraphrase or ‘all’ data cannot be used for cross-encoders and that’s the reason why non are published with this data?)

Best,
Moritz

nreimers · October 8, 2021, 11:25am

Hi,
Happy to hear that

Better cross encoders that are trained on larger datasets are on my agenda. However, training is not so straightforward. For bi-encoders, you use the other examples in a batch as negative.
For cross-encoders, you have to create the negative pairs. Here, the creation of the negative pairs plays an extremely important role.

I hope I will soon be able to train these models. But setting up the training etc takes some effort.

Best
Nils

MoritzLaurer · October 9, 2021, 6:50am

Great, happy to hear that this is on your agenda, this will be a great addition to the hub!

Topic		Replies	Views
Using Cross-Encoders to calculate similarities among documents Models	3	3712	December 3, 2020
(Auto) Sequence Classification model with triplets / contrastive loss Models	1	724	September 20, 2023
Fine tuning a sentence-transformer for cosine sim on 500k sentence pairs without labels-- advice 🤗Transformers	2	1198	April 20, 2024
Transformer vs Sentence-Transformer for text classification Intermediate	0	2177	March 12, 2024
Sentences' embeddings from BERT cross-encoder 🤗Transformers	0	274	December 22, 2022

Open-sourcing better cross-encoders for STILTS and better IR?

Related topics