Understanding regarding "Question Answering model using open-source LLM"

Iamexperimenting · May 3, 2023, 9:56pm

Hi,

I would like learn and understand how do I address the below questions, can someone please help me?

currently, I use my data(20 files) to create embedding from HuggingFaceEmbeddings. Even if I have 2 millions files do I need to follow the same steps like 1.create embedding from HuggingFaceEmbeddings, 2. do similarity test, and 3. pass it to model?
At what stage I need to retrain the LLM?
is it possible to retrain the LLM with my own data?
currently, your notebook show chromadb as vector db, In case if I want to move it production how do I host it? where do I store all my data(embeddings)? do I need to store all embedding in any database, if yes, could you please recommend any?
how do I evaluated dolly LLM with my data?
currently, I noticed dolly model with my data gives one wrong answer. so, how do I correct the model? if it is other model like text classification I would correct the label and retrain the model with corrected label. how do I do it here?

Topic		Replies	Views
Question answering model using open source LLM Models	0	2088	May 1, 2023
Create your LLM model Beginners	1	1888	December 9, 2024
Use embeddings stored in vector db to reduce work for LLM generating response Intermediate	0	1555	February 19, 2024
Create my LLM model Beginners	1	1569	April 1, 2024
Guidance on getting started with fine tuned uncensored model Beginners	2	1060	March 8, 2025