Hi @nreimers,
I find your research on bi-encoders and models on sbert.net super helpful. Based on your research I understand that cross-encoders generally perform better than bi-encoders, while their main disadvantage is computational speed.
I’m very interested in deepening my research in cross-encoders, but I noticed that you’ve only published comparatively few cross-encoders here: cross-encoder (Sentence Transformers - Cross-Encoders).
My question: Could you consider to publish improved cross-encoders, either trained on your paraphrase data or the ‘all’ data from the FLAX event (‘all-mpnet…’ etc.)?
I feel like this would have great added value for the HF- and research-community, because:
- Improved cross-encoders trained on more diverse data could be great improved STILTS for sequential transfer learning applications. (see here https://arxiv.org/pdf/1811.01088.pdf)
- Your bi-encoders are probably already good STILTS, but I imagine that cross-encoders would be even better. Using these intermediate models for task-specific fine-tuning would probably be a super easy way for people to get improved performance on many tasks - just by taking your cross-encoder as the base model instead of BERT-base etc.
- Having high-performance cross-encoders would also be useful for implementing BM25 & cross-encoder reranking for information retrieval applications etc.
Could you consider to published improved cross-encoders?
(Maybe there are technical reasons why your paraphrase or ‘all’ data cannot be used for cross-encoders and that’s the reason why non are published with this data?)
Best,
Moritz