JSON response for pdf text data

Hi All,

I have huge text extracted from pdf using OCR conversion, want to train a model with this set of data with a prompt text where definitions are provided for the data points to be extracted from the text given.

The format for the response is suppossed to be like defined below:
“profile”: “PROFILE”,
“disputeType”: “DISPUTE_TYPE”,
“verdict”: “VERDICT”,
“amountType”: “AMOUNT_TYPE”,
“amount”: “AMOUNT, Present the amount as number”,
“defaultDate”: “DEFAULT_DATE, Present the date or year in YYYY-mm-dd Format”,
“noticeDate”: “NOTICE_DATE, Present the date or year in YYYY-mm-dd Format”
Values in the JSON for PROFILE are suppossed to be fetched by the prompt text definitions.

We were able to achieve this using fine tuned model from openai.
want to achieve the same using open source model.
Wanted to understant which is the best model to be used?


See my answer here: Fine tune LLMs on PDF Documents - #9 by nielsr