CLIP like contrastive vision-language models for German with pre-traind text and vision models

CLIP like contrastive vision-language models for German with pre-trained text and vision models

For this project, a pre-trained image model like ViT and a pre-trained text model like BERT can be used as an image encoder and text encoder respectively.


Pre-trained ViT, BERT models can be found on the model hub. We could also use multi-lingual BERT/ROBERTa models for the German language.


The WIT dataset can be used for this task.

Available training scripts

A training script for this will be provided soon. (see PR)

(Optional) Desired project outcome

The desired outcome is to train a CLIP-like model for German language. This can be showcased with a streamlit or gradio app.

(Optional) Challenges

This model will require some modifications to the existing models. Specifically, we will need to add projection layers in both the text and image encoder models.

(Optional) Links to read upon


It’s quite an interesting project ,count me in :grinning:

1 Like

I would be happy to join this project. I’m super excited about JAX and have a little experience with it. My main background lies in computer Vision with Tensorflow and i have some experience with CLIP-like architectures and zero-shot classification.
Hope I can add something to the project!

1 Like

Hi guys!

Sounds like an interesting topic to get started with Jax/Flax and CLIP.
My main background is from recommender systems and NLP but I am interested in overlaps with CV, which makes this a great place to start!

Count me in! :slight_smile:

1 Like

Hi! The project sounds great. I already implemented a CLIP like model with Pytorch Lightning / Timm and transformers (Universal Sentence Encoder and RoBERTa). Looking forward to work with CLIP, JAX/Flax and TPUs. So count me in :smiley:


1 Like

Great defining this project! cc @valhalla

This project sounds interesting! I would love to join it. I have experience with GANs, Computer Vision, Pytorch Lightning, and at the same time studying German (B1). It would be a great learning experience for me. Please let me know how can I join?

1 Like

Where can I find the project in discord? Whats the channel name?