I'm looking for an 'image to text' model

Hi there!

I’m using OpenArt.ai to generate images. I would like to know if there is a model available that gives context to the generated images (fictional, fantasy).
Using GPT 4 it’s so easy to let the model generate entire stories by just inputting a few sentences, I would like a model to do this with an input of an image (with a few key words to steer it). Is such a model available? I would love to hear about it!

Thanks!