RAG on HF Inference for Pros - using Llama 2 + Llama 2 embeddings model

deekshith-rj · October 28, 2023, 9:13am

Hi,

I’m looking to use Hugging Face Inference for Pros along with one of the Llama 2 models + one of the Llama 2 embeddings model for one of my prototypes for Retrieval-Augmented Generation (RAG).

Llama 2 embeddings model - shalomma/llama-7b-embeddings · Hugging Face
Llama 2 model - Riiid/sheep-duck-llama-2-70b-v1.1 · Hugging Face

My concerns about this approach include

Are the models above compatible with each other?
Can Inference for Pros handle them, considering that the main model has 70B params?
Is there a better embeddings model to use with Llama 2 LLMs? Or is using Llama.cpp locally for embeddings a better approach?

Please advise.

Thanks & Regards,
Deekshith

Topic		Replies	Views
How to use llm model's api? Beginners	2	2717	November 14, 2024
Inference speed Spaces	0	370	September 17, 2023
How to Use HuggingFace free Embedding models Beginners	3	5528	October 7, 2024
Model requires a Pro subscription Beginners	4	2648	August 14, 2024
I want to know if I can use llama 2 7b for my project with hugging face pro subscription 9 $ only? Beginners	0	499	December 13, 2023

RAG on HF Inference for Pros - using Llama 2 + Llama 2 embeddings model

Related topics