Domain-specific word similarity problem

I am trying to create a chat-bot like application (inspired by chatGPT). The bot or application should be able to answer questions about our software on basis of help documents.
I have tried to finetune QuestionAnswering models like distilbert_base_uncased on less than 100 annotated samples. But my model performance is not great. Can anyone suggest alternative approaches?

Hi Vikassss,

Are you talking about the performance of the Q&A engine applied on a test dataset or more generally after deployment?

In the second case, the low performance could be originated in different parts of the pipeline, not only the model. For example:
1- what are you using as the retriever?
2- what is your ranking strategy for the context?
3- same question about the reader?

If your fine-tuned model is “forced” to find answers in non-optimal ranked contexts, it will fail.
Could you please tell us more about your evaluation methodology?

Thanks

Best Regards

Jerome

The most concrete suggestion I have would be to fine-tune the embeddings model on larger samples. For domain-specific use cases it’ll be really important to give as much of the domain-specific context as possible. Also, for my learning, what service are you using to fine-tune distilbert_base_uncased?