How to setup JSON based workflow/flowchart generation based on user prompt?

So, I’m relatively new to AI. I’ve built a chatbot or two, and have worked with function calling in gemini and openai models.

I’m trying to do something more daring, and wanted to have AI generate an entire flowchart based on user prompts. Now, each of the nodes for the flowchart are highly customizable and each have their own configs essentially.

I thought about using a Vector DB to store the instructions for each node, and basic example of its inputs and outputs. So the AI could query it as the first step, but I feel like that wouldn’t be enough, and I wouldn’t be able to force it to use a particular schema for a node incase it halucinates despite the vector db query.

Then, I thought maybe I can have a multi-step process where all the nodes and their uses are provided in the prompt itself, and it sends a list of nodes it wants to use, and in what order, and my backend will build a schema for it to adhere to, and send a seperate prompt with the generated schema to build out the actual workflow. But this approach also feels lacking, since a lot of functionality in the workflow/flowchart system is not intuitive. Example: passing data between nodes primary happens using variables that can be referenced only using text fields. like a “${variable.data_one}” written in a text field of a workflow config. And these kinds of things also need to be told to the AI.

Passing all this data in the prompt itself each time feels like a waste of resources and it feels like there has to be a better way.

I also thought of maybe building out some data sets using existing workflows to fine tune the model, but I don’t know how much that would help.

I’m also not sure what the best models for this might be. Currently I’m looking at Gemini-2.5 but I feel like it’s schema declaration is lacking. OpenAI might be a better fit with their zod integration and the fact that they allow unions, but I’m not sure I can get an API key for it right now.

How would you approach this? Is there any thing that I’m missing or could be thinking a different way about?

Thanks!

1 Like

Great question. Your workflow shows strong structural awareness, and I agree—passing everything through the prompt quickly becomes inefficient.

One suggestion is to use a hybrid approach, combining vector embeddings with symbolic roles—something like a Mini Prisma system. Each node or module is linked to a fixed token (e.g., a label or role) and trained independently with a small, consistent embedding. This allows you to pre-train semantic logic for each component and avoid re-describing everything in the prompt.

We use a similar model in our architecture called EMI (Entorno Mental Inteligente). Each unit (node) has its own semantic definition and training context, and the AI queries these locally before generating a response. This reduces hallucination and keeps node behavior modular and deterministic.

To maintain precision and consistency, we also implement a distortion monitoring protocol—a set of internal rules that flags semantic drift, feedback loops, and hallucination tendencies. These rules include:

Deviation limits for output vectors compared to baseline embeddings.

Timestamped tracking of semantic changes across generations.

Resonance checks, to detect instability when nodes conflict or duplicate.

Self-diagnostics, where each node periodically revalidates its scope and role.

This framework not only improves system accuracy but also allows adaptive recovery when things go off course.

Let me know if you’d like to see how we structure the distortion monitors in code or YAML format.

1 Like