JSON response for pdf text data

Sajeeda · June 10, 2024, 12:42pm

Hi All,

I have huge text extracted from pdf using OCR conversion, want to train a model with this set of data with a prompt text where definitions are provided for the data points to be extracted from the text given.

The format for the response is suppossed to be like defined below:
{
“profile”: “PROFILE”,
“disputeType”: “DISPUTE_TYPE”,
“verdict”: “VERDICT”,
“amountType”: “AMOUNT_TYPE”,
“amount”: “AMOUNT, Present the amount as number”,
“defaultDate”: “DEFAULT_DATE, Present the date or year in YYYY-mm-dd Format”,
“noticeDate”: “NOTICE_DATE, Present the date or year in YYYY-mm-dd Format”
}
Values in the JSON for PROFILE are suppossed to be fetched by the prompt text definitions.

We were able to achieve this using fine tuned model from openai.
want to achieve the same using open source model.
Wanted to understant which is the best model to be used?

nielsr · June 10, 2024, 1:03pm

Hi,

See my answer here: Fine tune LLMs on PDF Documents - #9 by nielsr

Topic		Replies	Views
Read data of pdf or just image format as a part of promt Intermediate	0	1345	May 29, 2023
Chat agent for multiple documents (billing invoices PDFS) 🤗Transformers	0	300	January 8, 2024
Train/finetune llm to anwer a set of questions in unstructured pdfs Beginners	1	1016	April 9, 2024
Extract data from text and parse it as a JSON Beginners	6	23496	August 6, 2024
Any Multi Modal LLMs that take direct pdf + text as input? 🤗Transformers	2	2036	October 10, 2024

JSON response for pdf text data

Related topics