How to make a model respect static UI layout and generate only overlay characters (ControlNet / SDXL / IPAdapter?)

memecaai · May 27, 2025, 3:42pm

Hi everyone!
We’re building a Web3 customization tool and are currently working on an AI pipeline where the model should:

Understand that the UI layout in the center (a wallet login screen) is not to be redrawn
Generate only one object or character that interacts with the interface (e.g., leans toward a button, sits beside it, etc.)
Return a transparent PNG, without adding background or modifying the UI
Ideally support prompt + guide image + layout-awareness

What we already tried:

We created a full JSON representation of the wallet layout:
Including positions, button labels, sizes, safe zones, and colors.

We also generate a guide image of the UI as a reference (Phantom Wallet login mockup)

We built a promptBuilder.ts that merges:

Hard-coded constraints (Do not cover the interface, etc.)
The layout as descriptive text (Unlock button at x:470, y:490)
User prompt (e.g., “Pepe touches the unlock button”)

Then we tested:

lucataco/sdxl-controlnet ( now returns 404)
stability-ai/stable-diffusion-xl-base-1.0 via HuggingFace API
IPAdapter in local pipelines
ComfyUI to build manual graph workflows

Issues we face:

Most models tend to redraw the UI layout, even when told not to
Background often reappears (even with transparent prompts)
Character generation isn’t aware of UI boundaries (like “don’t cover the Unlock button”)
IPAdapter respects style, but lacks fine-grained interaction control

Our ideal model:

We’re looking for a model (or combo) that can:

Accept both image + prompt + optional JSON or mask
Draw only the new character (no background, no UI duplication)
Ideally supports ControlNet mask or fine spatial constraints
Returns PNG with transparency

What we’d love from the community:

Any suggestions for models or pipelines that could help?
Has anyone tried layout-aware generation like this?
Would custom ControlNet training or DreamBooth variant help here?

We’re happy to share more screenshots or JSON layouts if needed.

Thanks in advance — this forum has been super helpful for us so far

John6666 · May 28, 2025, 3:52am

In terms of prompt comprehension, FLUX comes to mind. It might also be worth trying Inpainting. If developing from a pipeline, VTON might be a similar approach.

Topic		Replies	Views
Is there specific generative model to describe User Interfaces? Models	4	80	April 2, 2025
AI Interior Design Web App Beginners	7	77	June 23, 2025
Too many error when i prompt Beginners	3	67	September 15, 2024
Finetuning SDXL on long prompts 🧨 Diffusers	0	949	December 6, 2023
Customize the example shown in Inference API 🤗Hub	2	3982	November 7, 2022

How to make a model respect static UI layout and generate only overlay characters (ControlNet / SDXL / IPAdapter?)

What we already tried:

Issues we face:

Our ideal model:

What we’d love from the community:

Related topics