Please read the topic category description to understand what this is all about
One of the most exciting developments in 2021 was the release of OpenAI’s CLIP model, which was trained on a variety of (text, image) pairs. One of the cool things you can do with this model is use it to combine text and image embeddings to perform neural style transfer. In neural style transfer, the idea is to provide a prompt like “a starry night painting” and an image, and then get the model to produce a painting of the image in that style.
The goal of this project is to learn whether CLIP can produce good paintings from text prompts.
The CLIP models can be found on the Hub
For this project, you probably won’t need an actual dataset to perform neural style transfer. Just a single image should be enough to tune CLIP and an image encoder. Of course, you are free to experiment with larger datsets if you want!
This project goes beyond that concepts introduced in Part II of the Course, so some familiarity with computer vision would be useful. Having said that, the Transformers API is similar for image tasks, so if you know how the
pipeline() function works, then you’ll have no trouble adapting to this new domain.
- Create a Streamlit or Gradio app on Spaces that allows a user to provide an image and a text prompt, and produces a painting of that image in the desired style
You can Google “neural style transfer” to find plenty of information about this technique. Here one advanced example to give you an idea:
- GitHub - orpatashnik/StyleCLIP: Official Implementation for "StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery" (ICCV 2021 Oral)
To chat and organise with other people interested in this project, head over to our Discord and:
Follow the instructions on the
Just make sure you comment here to indicate that you’ll be contributing to this project