Poster2Plot: Generate Movie/T.V show plot from poster


Our team is working on building an image captioning model which can generate a movie/t.v show plot from it’s poster.

The goal of this project is to create an image captioning model using a transformer encoder model like Vision Transformer (ViT) and a transformer decoder language model like GPT-2


Any vision based encoder and language model decoder would be a good candidate to train the VisionEncoderDecoderModel for image captioning. We are trying the following models first:


We are using publicly available IMDb datasets to train the model.
Some examples:


The main challenge is to create a good dataset of poster and movie plots. Also it will be interesting to see if the model gives good predictions for non-english movies/tv shows.

Desired project outcomes

We will create a Streamlit or Gradio app on :hugs: Spaces that can predict a movie/t.v show plot from it’s poster.

1 Like

Let’s give it a try.

1 Like

@dk-crazydiv and I were able to train a VisionEncoderDecoderModel to generate movie/t.v show plot from poster. We used google/vit-base-patch16-224-in21k encoder and gpt2 decoder.

We have uploaded the model to :hugs: model hub poster2plot

@lewtun Link to the Gradio app on :hugs: Spaces poster2plot

We are still working on improving the model.

1 Like

Wow, this is an incredibly cool project and Space that you’ve created - great job! Thank you for taking part in the course event :slight_smile:


Awesome work @dsr and team!


Thank you. It was indeed very fun for us to build and super fun to manually test as well. The course content along with some existing code snippets helped a lot. The ease with which one could build and demo projects like these in HF ecosystem cannot be praised enough. The entire ecosystem cuts down the entire idea to deployment time by at least 10x. With the hub, datasets, abstractions in transformer lib and spaces, we were able to do in a day, what would take probably weeks.