Text to structure: a way to standardize outputs

dpiret · February 18, 2023, 6:32am

Total newbie here.
My goal is to convert an input text to a standardized structure that would allow me, later on, to process tabulated data in JSON format.

For example,

Input: “Give me a list of all clients having purchased milk”
Output: {"intention": "retrieve", "object": "client", "conditions":['purchase', 'milk']}
Input: “Please, machine, do me a favor and delete users not having logged in after 2022”
Output {"intention": "delete", "object": "user", "conditions":['logged-in', '2022-12-31']}

The output JSON structure has fixed keys (intention, object, conditions) and values can be either discrete (for example intention can only be ['retrieve', 'delete', 'modify']) or variable (for example conditions can contain any piece of data.

My approach would be to use named entity recognition (NER) to identify the relevant entities and their properties, and syntactic parsing to determine the structure of the user’s prompt. For example, the “Give me a list” would result in the entity intention to be retrieve.

After reading, watching, and practicing, I think I’m now totally lost and not even sure the NER approach is advisable in this context.

Any help would be much appreciated

kamranh · November 23, 2023, 10:28pm

For your task you can use langchain output parsers. You’d have to setup a local or huggingfacehub pipeline for accessing your model from huggingface. I am not sure if HuggingFace has a similar library.
Biggest issue you would go into is langchain works best with chatgpt and some of the simpler text to text models available like Falcon-7b or google t5 are bad at structuring result to json(would love to know if someone can recommend a simple pretrained language model that can structure output to json).
Hope this help! Noticed your issue didn’t get any activity for a long time.

Sotiris112 · July 20, 2024, 9:58pm

Hi,
did you ever figure it out, I am having the same problem.

dpiret · July 21, 2024, 10:59am

All I tried failed, so I ended up using openAI, asking in the system prompt to reply with a JSON structure. It doesn’t work perfectly, but you can format the string into a valid JSON with some processing of the response.
You’ve also got the JSON mode, but the response is much poorer – with my requirements-. You may also want to look into function calling.

Topic		Replies	Views
Recommend an AI model for structured (json) Beginners	1	8764	June 15, 2023
Trying to choose a model for converting natural language to structured queries/output Beginners	0	465	December 5, 2023
Extract data from text and parse it as a JSON Beginners	6	23190	August 6, 2024
Information extraction Research	0	475	July 26, 2023
Getting models to output structured JSON Beginners	1	1161	October 30, 2023

Text to structure: a way to standardize outputs

Related topics