Use OpenAI's CLIP for style transfer

:wave: Please read the topic category description to understand what this is all about


One of the most exciting developments in 2021 was the release of OpenAI’s CLIP model, which was trained on a variety of (text, image) pairs. One of the cool things you can do with this model is use it to combine text and image embeddings to perform neural style transfer. In neural style transfer, the idea is to provide a prompt like “a starry night painting” and an image, and then get the model to produce a painting of the image in that style.

The goal of this project is to learn whether CLIP can produce good paintings from text prompts.


The CLIP models can be found on the Hub


For this project, you probably won’t need an actual dataset to perform neural style transfer. Just a single image should be enough to tune CLIP and an image encoder. Of course, you are free to experiment with larger datsets if you want!


This project goes beyond that concepts introduced in Part II of the :hugs: Course, so some familiarity with computer vision would be useful. Having said that, the :hugs: Transformers API is similar for image tasks, so if you know how the pipeline() function works, then you’ll have no trouble adapting to this new domain.

Desired project outcomes

  • Create a Streamlit or Gradio app on :hugs: Spaces that allows a user to provide an image and a text prompt, and produces a painting of that image in the desired style

Additional resources

You can Google “neural style transfer” to find plenty of information about this technique. Here one advanced example to give you an idea:

Discord channel

To chat and organise with other people interested in this project, head over to our Discord and:

  • Follow the instructions on the #join-course channel

  • Join the #neural-style-transfer channel

Just make sure you comment here to indicate that you’ll be contributing to this project :slight_smile:

Very interesting topic, I am in!

1 Like


I’d also like to contribute to this project. I believe it’s quite aligned with another project I’m working on, namely “Image search using CLIP”. :slight_smile:

1 Like

Hi, I would like to contribute to this project :slight_smile:

1 Like