Creation of Images from Text-Prompt (Customized Training)

Is there a AI model which generates images from text prompt?

I think that all of the currently popular Text-to-Image models are basically capable of doing this. (SD2.1 is also capable.)
If you want to use prompts that are closer to natural language, you can achieve this by using newer architectures such as FLUX. (In this case, it is too heavy, so I think you will have to use some kind of cloud service.:sweat_smile:)

And the model hasn’t been trained on any real-data, something like DeepMind AlphaGo

They say that for Go, they don’t need data from real professional Go players to strengthen the model anymore. In the case of Go, there is a mathematical correct answer, so that approach is possible. You just need to reduce the number of incorrect answers.

However, words, pictures and photographs are almost entirely dependent on human perception, and are a kind of illusion created by humans. Birds and insects see colors differently, and the concepts that words refer to are even more unstable. In other words, while these models may be suitable or unsuitable for a given purpose, there are not many right or wrong answers, and relying solely on mathematics is probably not a good approach. The only thing that can be guaranteed physically is shape.
In addition, there has recently been a widespread attempt to train AI models using data output by AI models that have reached a certain level of maturity (synthetic data). However, this does not mean that real-world data has been excluded.
Also, there are several models that have been trained using only data that does not raise any legal issues (Creative Commons-compliant models). There are also some in HF.

However, I don’t know of any model that doesn’t use any real-world data. If you do that, I think you could create a model that generates some kind of image that doesn’t exist in the real world (I don’t even know if humans would be able to recognize it as an image)…:roll_eyes:
Well, if you initialize the model data randomly and then train it, you might be able to get something close, but the question is how to provide the images for this without using real-world data… It might be a chicken-and-egg problem.

1 Like