Is there specific generative model to describe User Interfaces?

prshir · March 31, 2025, 6:14pm

There is a lot of generative image-to-text models hosted on huggingface.
But is there any specific model to describe user interfaces?
All models that I checked are common, i.e. they recognize not domain-specific objects but commonly used.
I want the model to describe UI by given screenshot. E.g. when I give the below screenshot the model writes smth like: “There are options: File, Edit, View and some groups of icons: Selection, Image, Tools, Brushes”. Or it could be without text recognition: "Upper part and lower part are light blue, middle part is almost white and contains a lot of icons "

John6666 · April 1, 2025, 6:25am

Like these?

prshir · April 1, 2025, 5:59pm

@John6666 Thanks, yes, exactly.
Are there smth ready to use? I read the paper about ScreenAI but can’t find ScreenAI model on HF. The second model from Xiaomi also looks cool but github description probably suggests it should be trained before usage.
The ideal would be ready to use model that could be downloaded from HuggingFace

John6666 · April 2, 2025, 4:36am

Hmm… I can’t find UI image recognition models that can be used with Hugging Face…

Is a general-purpose VLM that accepts prompts not good enough? It can’t be used for very precise applications, though.
The example below is Qwen 2.5 VL 32B, but Aya Vision also has very good performance, and Florence 2 and Paligemma 2 are well-known for smaller ones. If you’re looking for LLM performance, try Llava.

prshir · April 2, 2025, 6:52pm

Yes, it is what I need. I’ll try this models. Thanks a lot

Topic		Replies	Views
Title: Recommendations for Models that Handle Text and Screenshots for QA Models	15	1038	November 7, 2024
Any Multi Modal LLMs that take direct pdf + text as input? 🤗Transformers	2	1809	October 10, 2024
Choosing the right model to generate simple art from text Beginners	0	263	December 20, 2023
Multimodal datasets and corresponding models Beginners	2	71	March 12, 2025
OpenAI AI Assistant Alternative Using HuugingFace Models Intermediate	0	281	December 7, 2023

Is there specific generative model to describe User Interfaces?

Related topics