In the current ecosystem, it appears that vocab-transformers are sort of in a small box compared to the larger sentence-transformers collection.
As BGE and GTE are upon us, I was curious if they would be a good fit for a current project. In order to determine if either of these models might be a good fit, I decided to look at their vocabs, which are word-piece, although largely entire words. Therefore, shouldn’t the embeddings from these models for single words be appropriate? I suppose it would be relatively important to check the model architectures, but even the vocab-transformer word2vec distilbert model have additional modules in the embedding functionality beyond the embedding lookup.
In that sense, I wonder if there shouldn’t be a linkage between certain sentence-transformers and their potentially deserved place in the vocab-transformers, to encourage people to use more SOTA models?