Train/finetune llm to anwer a set of questions in unstructured pdfs

Hi, I am going through an issue I have a hard time finding informations on a specific task I want an opensource LLM to be able to do.
I have numerous unstructured pdfs (Company Annual financial reports of hundreds of pages) and for each, the answers to the same questions : (ex What is the percentage of women on the board? (in %), Is Biodiversity mentionned in the report ? (Yes/No)). The set of question are mostly boolean or numbers to retrieve. I want to train my model to more accurately answer these same questions for other unstructured pdfs. I have tried with a basic llm pdf chat app but the results are really bad.

Even after browsing the internet for days, I can’t seem to find which solution fits the best to my issue and how to implement it.
Thanks in advance, any advice is welcome !

Embed your data and use RAG to retrieve documents relevant to the query. Add the contents of these documents to your prompt. Use a recent model such as Mistral/Mixtral. Langchain has a guide for RAG.