It turns out that the embedding of the special beginning token in this model remains almost the same for different input texts. I guess that’s the reason. We can’t use the embedding of the beginning token to represent the whole sequence in this model. Got the answer from python - Mistral model generates the same embeddings for different input texts - Stack Overflow