I would like learn and understand how do I address the below questions, can someone please help me?
- currently, I use my data(20 files) to create embedding from HuggingFaceEmbeddings. Even if I have 2 millions files do I need to follow the same steps like 1.create embedding from HuggingFaceEmbeddings, 2. do similarity test, and 3. pass it to model?
- At what stage I need to retrain the LLM?
- is it possible to retrain the LLM with my own data?
- currently, your notebook show chromadb as vector db, In case if I want to move it production how do I host it? where do I store all my data(embeddings)? do I need to store all embedding in any database, if yes, could you please recommend any?
- how do I evaluated dolly LLM with my data?
- currently, I noticed dolly model with my data gives one wrong answer. so, how do I correct the model? if it is other model like text classification I would correct the label and retrain the model with corrected label. how do I do it here?