KoCLIP: Pretraining CLIP on Korean

jaketae · June 24, 2021, 5:03pm

KoCLIP

Building on top of Korean Language models that are publicly available, we want to train a multimodal generative system. Specifically, we train CLIP on Korean datasets using KoBERT and ViT as backbones.

Model

ViT, KoBERT (or any other Korean encoder LM)

Datasets

KETI has released a Korean image captioning dataset, available on AI Hub. We can also utilize the WIT dataset, which is a multilingual dataset scraped from Wikipedia.

Training Script

The training script is (almost) already available, see here.

Challenges

KoBERT likely performs slightly worse than the default English BERT. While this could be a bottleneck, we can always write the script in such a way that it is easy to plug-and-play different LMs. This way, when a better Korean LM is released, it can easily be used.
There might be minor architectural adjustments we have to make (e.g. adding projection layers).
Attaining fluency in JAX will take time and effort.

Desired Outcomes

The final deliverable of this project will most likely be an open source repository, accompanying documentation, model weights, and potentially a demo Streamlit app.

tree-park · June 25, 2021, 2:14pm

Great! Iet me join this fun project!

jaketae · June 25, 2021, 2:42pm

Thanks everyone! Please feel free to share any thoughts you have regarding the direction or details of this project. Looking forward to the next couple of weeks.

jaketae · June 26, 2021, 1:52pm

Agreed. I mean I don’t think timezone should ever prevent anyone from joining, but for the purposes of arranging logistics, it would certainly be helpful. I’m also on KST/GMT+9.

srisweet · June 27, 2021, 5:31am

Hello @jaketae & Team,
I am interested to be a part of such an amazing project & team. I will try my best to contribute to the Korean version of the CLIP model. It would be nice if we could discuss some learning resources that would be useful for this project. I can work in any time zone that is comfortable for everyone in the team.

valhalla · June 28, 2021, 4:34pm

Great! let’s officially define this project

Putting everybody in the official sheet here. More people can still join! Leave a comment here or on the sheet if you want to change something.

junhsss · June 28, 2021, 4:56pm

IMHO aligning Korean encoders with the pre-trained CLIP text encoder will probably suffice. It would be great if we could do better. Count me in (I’m on GMT+9)

valhalla · June 28, 2021, 5:06pm

Added you to the team

kyungeun · June 30, 2021, 1:08am

Hi @jaketae & Team,
I am interested to be a part of KoCLIP team. I am Korean and will try my best to contribute to this project as possible as i can. It would be nice if i could participate in this project and discuss about it. I can work in any time zone that is comfortable for everyone in the team.

valhalla · June 30, 2021, 9:11am

Hi @amphora , @kyungeun added you to the team

Topic		Replies	Views
IndoClip : Pre Training Clip for Indonesian dataset Flax/JAX Projects	3	479	June 30, 2021
CLIP like contrastive vision-language models for German with pre-traind text and vision models Flax/JAX Projects	5	1828	July 4, 2021
CLIP like contrastive vision-language models for Spanish with pre-trained text and vision models Flax/JAX Projects	4	397	June 29, 2021
Multilingual Image Captioning Flax/JAX Projects	10	1284	July 6, 2021
Vision-Language Project Ideas Flax/JAX Projects	13	1549	June 30, 2021