Greetings.
Would it be possible to suggest models that can generate both text and images based on text only prompt.
Greetings.
Would it be possible to suggest models that can generate both text and images based on text only prompt.
I think that your idea is part of the any-to-any model.
And you can also generate text or images using two models that combined llm and diffusion model.
But if we combine them that is any-to-any model. I know that there isn’t any model have this ability.
It’s ongoing to develop.
Thanks @Alanturner2 for the feedback.
So the only way I see now is to integrate the two models by passing the generated text to a diffusion model for which images are required.
Then as a final step merge the text and images into a single output.
Yeah that’s right.
You can use LLM and diffusion as a base model .
And than you can add encoder or decoder part for diffusion model.
Than you can generate image and text at the same time.
it truly looks fascinating mate! I have not used anything like this before. Interesting
There are many models that accept your condition.
1.OpenAI GPT 4
this is perhaps the most advanced option for multimodal capabilities.
2.Google DeepMind’s Gemini
3. Midjourney and stable diffusion
4. CLIP and Artbreeder