CLIP like contrastive vision-language models for German with pre-traind text and vision models

valhalla · June 23, 2021, 11:11am

CLIP like contrastive vision-language models for German with pre-trained text and vision models

For this project, a pre-trained image model like ViT and a pre-trained text model like BERT can be used as an image encoder and text encoder respectively.

Model

Pre-trained ViT, BERT models can be found on the model hub. We could also use multi-lingual BERT/ROBERTa models for the German language.

Datasets

The WIT dataset can be used for this task.

Available training scripts

A training script for this will be provided soon. (see PR)

(Optional) Desired project outcome

The desired outcome is to train a CLIP-like model for German language. This can be showcased with a streamlit or gradio app.

(Optional) Challenges

This model will require some modifications to the existing models. Specifically, we will need to add projection layers in both the text and image encoder models.

(Optional) Links to read upon

bharatR · June 25, 2021, 12:38pm

It’s quite an interesting project ,count me in

almajo · June 29, 2021, 1:19pm

Hi guys!

Sounds like an interesting topic to get started with Jax/Flax and CLIP.
My main background is from recommender systems and NLP but I am interested in overlaps with CV, which makes this a great place to start!

Count me in!

patrickvonplaten · June 29, 2021, 3:12pm

Great defining this project! cc @valhalla

sachdevkartik · July 1, 2021, 9:00pm

This project sounds interesting! I would love to join it. I have experience with GANs, Computer Vision, Pytorch Lightning, and at the same time studying German (B1). It would be a great learning experience for me. Please let me know how can I join?

sachdevkartik · July 4, 2021, 5:50am

Where can I find the project in discord? Whats the channel name?

Topic		Replies	Views
CLIP like contrastive vision-language models for Spanish with pre-trained text and vision models Flax/JAX Projects	4	397	June 29, 2021
IndoClip : Pre Training Clip for Indonesian dataset Flax/JAX Projects	3	479	June 30, 2021
Image captioning for Indonesia with pre-trained vision Flax/JAX Projects	4	485	June 29, 2021
Image captioning for Japanese with pre-trained vision and text model Flax/JAX Projects	0	1170	June 23, 2021
Vision-Language Project Ideas Flax/JAX Projects	13	1549	June 30, 2021