I have a task to create a chatbot in my native language: Bulgarian. I was checking models out but did not find any in my language. So, I guess I have to train a model in Bulgarian for my chatbot or I was thinking of somehow incorporating embeddings from a transformer trained in Bulgarian to a Llama model. I actually, don’t know if this will work since I haven’t created a chatbot before. So, I would appreciate any advice, or sources (code and text) that I can learn from.
I am not sure what you mean by incorporating embeddings from a Bulgarian model to Llama. How would you do such a thing?
I guess, the best approach would to find a chat dataset in Bulgarian and fine-tune Llama on it. If you lack the resources (big GPUs) for that, you probably want to look into PEFT or even use an already quantized model and then fine-tune it with PEFT. More on the topic can be found here.
I hope that helps.