Seeking assistance to extract specific information from the given prompt without generating new content

Divya07 · June 14, 2023, 10:58am

Hey everyone!

I’m currently working on a project that involves extracting question and answers from an exam question PDF document and storing each question (along with options, answer, and explanation) in json format. To accomplish this, I’m using regular expressions to identify the questions in the text, and then utilizing the OpenAI GPT-3.5 Turbo model to generate structured outputs in JSON format for each question.

However, I’m encountering a specific issue with the model. Even though I’ve provided clear instructions in the prompt to only extract options if they are explicitly available in the text, the model still generates options from the explanation section. I want to ignore questions which do not have options in proper format (because some options are in the form of images).

Here’s the prompt I’m using: “Extract information from text. {format_instructions} The response should be presented in a markdown JSON codeblock. Question description: {inputText}.Please remember that if options are not explicitly present in the prompt text in the form of ‘A. option_a_text’, ‘B. option_b_text’ and so on, do not extract ‘options’ from answer/explanation and set the ‘options’ field as an empty object and provide ‘result’ field as ‘failed’. If options are present, provide the correct ‘ans’ field with valid options (a, b, c, d, e) and provide ‘result’ field as ‘success’. Do not make up options or answers or explanation.”

I would greatly appreciate your assistance in understanding why the model is generating options in violation of the given instructions and if there are any potential solutions or alternative approaches that can improve the accuracy of option extraction.

PS :- I am using zod for schema validation and StructuredOutputParser from the langchain/output_parsers module to parse the output generated by the OpenAI GPT-3.5 model.

PranavBala · May 24, 2024, 5:43am

hey did you find the solution for this

RaushanTurganbay · May 24, 2024, 8:45am

Hey!

To generate guaranteed structured output from LLMs you can use one of the following tools:

Outlines: Can be easily used with transformers and offers constraints with JSON, regex and more.
Context-Free_grammar based tool: similar to outlines but works only with context-free-grammar constraint. Compared to outlines, this one is better and faster for complex rules/constraints.
JsonFormer: Specifically crafted for json format. I have’t tried it yet personally.

In general I would recommend to use either outlines or jsonformer if you’re trying to generate dict-like object

Topic		Replies	Views
JSON response for pdf text data Beginners	1	536	June 10, 2024
RAG LLM Generating the Prompt also at the response Beginners	8	4220	September 25, 2024
LLMs Return Prompt as Well as Generated Text Beginners	2	1459	June 20, 2024
Same model GPT-NEO-XT behave differently with same prompts & different context 🤗Transformers	0	277	April 19, 2023
Understanding How GPT Models Differentiate Between Questions and Instructions in API Usage Intermediate	1	82	November 25, 2024

Seeking assistance to extract specific information from the given prompt without generating new content

Related topics