Models for Multi-lingual Embeddings (similarity search)?

AnhND · August 26, 2023, 11:17am

Good day,

I have a use case for text-search (similarity based) for non-English language (Vietnamese in particular).

Hoping I could pls get some pointers on how to use HF’s model to generate embedding (for vector DB). I’ve found this one in particular that is promising: VoVanPhuc/sup-SimCSE-VietNamese-phobert-base · Hugging Face. It’s a Transformers that is suitable for ‘sentence similarity’. This is what I found from searching:

# Generate the embeddings using the model
with torch.no_grad():
    model_output = model(**encoded_input)
    embeddings = model_output.last_hidden_state.mean(dim=1)

Also, not sure if I’m looking at the right task (new to Hugging Face)

Much appreciated!

Topic		Replies	Views
A service to translate datasets into other languages 🤗Datasets	1	860	June 6, 2023
RAG Embeddings: German language Beginners	10	6623	May 23, 2024
Suggestions for hugging face transformer models for Code and Formal Languages Intermediate	2	1756	May 3, 2022
German NLP Repository Languages at Hugging Face	11	4535	November 21, 2023
Sentence-transformers 🤗Transformers	13	756	May 9, 2023

Models for Multi-lingual Embeddings (similarity search)?

Related topics