Hi, I’m new at the platform, and trying to build a RAG app with my word doc as knowledge base and llama as LLM model.
In order to embed text, I’m struggling with a free model implementation, such as HuggingFaceEmbeddings, but most documentation I have access to is a little bit confusing regard importation and newest version.
Some sources:
from langchain_huggingface import HuggingFaceEmbeddings
Another ones:
from langchain.embeddings import HuggingFaceEmbeddings
Are there a “source of truth” material in which I can check in order to finish my project?
Apparently it is both libraries by a third party and maybe both are real.
I am not familiar with Embedding or LangChain, so we will wait for someone more knowledgeable to come by.
I found a couple of official HF introductory articles.
Personally, I’ve felt that LangChain has pretty fast-paced updates, some of which change the internal structure of the library itself (for example, where the classes are located). Typically, they will indicate that a particular class is deprecated, and suggest alternatives on their official API documentation.
I would say it depends on which version of LangChain you use; depending on your version, it may be the case that both work, or only one of the two works. I would recommend updating the library and sticking to the newest implementations, unless you run into bugs.
Personally I recommend Chroma db for RAG apps as it handles conversion of any BERT encoder models to embedding models through mean pooling the embedding layer. Plus it has support for many 3rd parties including SentenceTransformers and Huggingface Transformers. It also supports conditional (By metadata) retrieval.