Hi,
I’m looking to use Hugging Face Inference for Pros along with one of the Llama 2 models + one of the Llama 2 embeddings model for one of my prototypes for Retrieval-Augmented Generation (RAG).
- Llama 2 embeddings model - shalomma/llama-7b-embeddings · Hugging Face
- Llama 2 model - Riiid/sheep-duck-llama-2-70b-v1.1 · Hugging Face
My concerns about this approach include
- Are the models above compatible with each other?
- Can Inference for Pros handle them, considering that the main model has 70B params?
- Is there a better embeddings model to use with Llama 2 LLMs? Or is using Llama.cpp locally for embeddings a better approach?
Please advise.
Thanks & Regards,
Deekshith