For the embeddinggemma model, we can add prompts for specific tasks: https://ai.google.dev/gemma/docs/embeddinggemma/model_card#prompt-instructions
Two of them are:
| Clustering |
Used to generate embeddings that are optimized to cluster texts based on their similarities |
task: clustering | query: {content} |
|:—|:—|:—|
|
Semantic Similarity |
Used to generate embeddings that are optimized to assess text similarity. This is not intended for retrieval use cases. |
task: sentence similarity | query: {content} |
But when doing clustering, you basically want to group sentences with similar semantic meanings together, so it is just semantic similarity. What can be the difference between the Clustering and Semantic similarity embeddings?
If you want to cluster sentences with similar semantic meaning, which should be used?