Mistral model generates the same embeddings for different input texts

hliuci · April 12, 2024, 9:17am

It turns out that the embedding of the special beginning token in this model remains almost the same for different input texts. I guess that’s the reason. We can’t use the embedding of the beginning token to represent the whole sequence in this model. Got the answer from python - Mistral model generates the same embeddings for different input texts - Stack Overflow

Topic		Replies	Views
Correct interpretation of the model embbedings output Beginners	1	239	May 26, 2021
Why do I get different embeddings when I perform batch encoding in huggingface MT5 model? 🤗Transformers	2	657	March 12, 2024
Tensor size error when generating embeddings for documents using pre-trained models 🤗Transformers	3	541	April 11, 2024
Fine-tuning a language model on domain specific embeddings 🤗Transformers	1	1138	November 21, 2023
Getting the same embedding from llama 2 class token for any input 🤗Transformers	1	1306	December 4, 2023