Model that can generate both text and image as output

BibhutiPadhi · December 19, 2024, 6:29am

Greetings.

Would it be possible to suggest models that can generate both text and images based on text only prompt.

Alanturner2 · December 19, 2024, 7:13am

I think that your idea is part of the any-to-any model.
And you can also generate text or images using two models that combined llm and diffusion model.
But if we combine them that is any-to-any model. I know that there isn’t any model have this ability.
It’s ongoing to develop.

BibhutiPadhi · December 19, 2024, 8:20am

Thanks @Alanturner2 for the feedback.

So the only way I see now is to integrate the two models by passing the generated text to a diffusion model for which images are required.
Then as a final step merge the text and images into a single output.

Alanturner2 · December 19, 2024, 8:42am

Yeah that’s right.
You can use LLM and diffusion as a base model .
And than you can add encoder or decoder part for diffusion model.
Than you can generate image and text at the same time.

godofprogrammer · December 21, 2024, 11:02am

it truly looks fascinating mate! I have not used anything like this before. Interesting

ValdeJunior · December 31, 2024, 12:15am

There are many models that accept your condition.

1.OpenAI GPT 4
this is perhaps the most advanced option for multimodal capabilities.
2.Google DeepMind’s Gemini
3. Midjourney and stable diffusion
4. CLIP and Artbreeder

Topic		Replies	Views
New Stable Diffusion 🔒 Gradio	0	887	September 2, 2023
Diffuser API Inference Community Limited to 1 Image Return Inference Endpoints on the Hub	0	490	April 8, 2023
Generate GIF reply to English text with VQGAN + CLIP Flax/JAX Projects	23	3328	July 2, 2021
I'm looking for an 'image to text' model Beginners	0	837	April 2, 2023
Image to Text model that can take an additional text as input for context 🤗Hub	1	500	September 5, 2023

Model that can generate both text and image as output

Related topics