Input image and a question about the image and get a result

mms1998 · August 22, 2023, 3:04pm

Here’s a revised version of your post:

Hello everyone,

I’m interested in developing software where I can input an image of a bathroom and then ask questions like, “What condition is the bathroom in?” and “Which decade does the bathroom appear to be from?”. I’ve tried using the sceneExplain plugin with ChatGPT, but the results have been off, suggesting that 40-year-old, worn-out bathrooms are in great condition.

I have a decent background in programming, so I believe I might need to train a model myself. However, I’m unsure about which categories of models are best suited for this purpose and which ones I can train on my own.

Can anyone provide guidance on the best models for this task, or perhaps link me to a tutorial on how to train them?

nielsr · August 22, 2023, 6:54pm

Hi,

You’re in luck cause Hugging Face released a model that can do just that today. The model is called IDEFICS, and can be seen as a ChatGPT model that can also take arbitrary sequences of images as input.

Here’s an example on a random bathroom image:

You can try the demo here: IDEFICS Playground - a Hugging Face Space by HuggingFaceM4.
Here’s a blog post about IDEFICS: Introducing IDEFICS: An Open Reproduction of State-of-the-art Visual Langage Model.

Do note that IDEFICS is pretty large (there are 2 sizes, 9 billion and 80 billion parameters). You can also train much smaller models to do this, like ViLT, BLIP or InstructBLIP.

Topic		Replies	Views
If I want to find an app/model that does "inpainting" how do I search? Beginners	2	26	April 26, 2025
Train a Custom AI Model for Interior Room Renovations (Before/After Image Training) Languages at Hugging Face	4	26	June 21, 2025
Alot of questions, or, How can i run models locally (for an absolute begginger) Beginners	3	44	July 4, 2025
Need help in determining model quality Beginners	32	122	January 2, 2025
Upload my own pics to create a new style Beginners	2	404	September 13, 2022

Input image and a question about the image and get a result

Related topics