Please read the topic category description to understand what this is all about
Description
One of the most exciting developments in 2021 was the release of OpenAI’s CLIP model, which was trained on a variety of (text, image) pairs. One of the cool things you can do with this model is use it to combine text and image embeddings to perform neural style transfer. In neural style transfer, the idea is to provide a prompt like “a starry night painting” and an image, and then get the model to produce a painting of the image in that style.
The goal of this project is to learn whether CLIP can produce good paintings from text prompts.
Model(s)
The CLIP models can be found on the Hub
Datasets
For this project, you probably won’t need an actual dataset to perform neural style transfer. Just a single image should be enough to tune CLIP and an image encoder. Of course, you are free to experiment with larger datsets if you want!
Challenges
This project goes beyond that concepts introduced in Part II of the Course, so some familiarity with computer vision would be useful. Having said that, the Transformers API is similar for image tasks, so if you know how the pipeline()
function works, then you’ll have no trouble adapting to this new domain.
Desired project outcomes
- Create a Streamlit or Gradio app on Spaces that allows a user to provide an image and a text prompt, and produces a painting of that image in the desired style
Additional resources
You can Google “neural style transfer” to find plenty of information about this technique. Here one advanced example to give you an idea:
Discord channel
To chat and organise with other people interested in this project, head over to our Discord and:
-
Follow the instructions on the
#join-course
channel -
Join the
#neural-style-transfer
channel
Just make sure you comment here to indicate that you’ll be contributing to this project