The current MTEB Leaderboard is dominated by LLM-based text embedding models, demonstrating their effectiveness in this field. However, using these embeddings in real-world projects can be expensive due to their high dimensionality (often 4096, 3584, or even larger).
Recently, I’ve been experimenting with dimensionality reduction techniques for LLM text embeddings, motivated by the desire for greater efficiency. I explored methods inspired by two papers: “Matryoshka Representation Learning” and “Espresso Sentence Embeddings”.
However, I stumbled upon a surprising discovery due to a bug in my code. It turns out that simple truncation (or pruning) of the embedding vector based on position yields comparable results to using the full-size vector!
- Truncation/pruning can be applied to select the first X dimensions, the last X dimensions, a segment from the middle, or even elements at arbitrary positions within the vector.
I tested this approach with various models, including a Vistral Text embedding model (fine-tuned from Vistral 7B Chat), gte-qwen2-1.5b-instruct, and multilingual BERT, and all showed similar results.
This finding has left me bewildered. Why is this happening? Could it be that the information is so evenly distributed within the vector that truncation/pruning has little impact compared to the full-size representation?
Does this mean that sophisticated dimensionality reduction algorithms and techniques are no longer necessary?
I’m eager to hear your thoughts and insights on this unexpected observation. Please share your opinions in the comments!