KoCLIP: Pretraining CLIP on Korean

KoCLIP

Building on top of Korean Language models that are publicly available, we want to train a multimodal generative system. Specifically, we train CLIP on Korean datasets using KoBERT and ViT as backbones.

Model

ViT, KoBERT (or any other Korean encoder LM)

Datasets

KETI has released a Korean image captioning dataset, available on AI Hub. We can also utilize the WIT dataset, which is a multilingual dataset scraped from Wikipedia.

Training Script

The training script is (almost) already available, see here.

Challenges

  1. KoBERT likely performs slightly worse than the default English BERT. While this could be a bottleneck, we can always write the script in such a way that it is easy to plug-and-play different LMs. This way, when a better Korean LM is released, it can easily be used.
  2. There might be minor architectural adjustments we have to make (e.g. adding projection layers).
  3. Attaining fluency in JAX will take time and effort.

Desired Outcomes

The final deliverable of this project will most likely be an open source repository, accompanying documentation, model weights, and potentially a demo Streamlit app.

5 Likes

Great! Iet me join this fun project!

1 Like

Thanks everyone! Please feel free to share any thoughts you have regarding the direction or details of this project. Looking forward to the next couple of weeks.

Agreed. I mean I don’t think timezone should ever prevent anyone from joining, but for the purposes of arranging logistics, it would certainly be helpful. I’m also on KST/GMT+9.

Hello @jaketae & Team,
I am interested to be a part of such an amazing project & team. I will try my best to contribute to the Korean version of the CLIP model. It would be nice if we could discuss some learning resources that would be useful for this project. I can work in any time zone that is comfortable for everyone in the team.

Great! let’s officially define this project :slight_smile:

Putting everybody in the official sheet here. More people can still join! Leave a comment here or on the sheet if you want to change something.

1 Like

IMHO aligning Korean encoders with the pre-trained CLIP text encoder will probably suffice. It would be great if we could do better. Count me in :relaxed: (I’m on GMT+9)

Added you to the team :slight_smile:

2 Likes

Hi @jaketae & Team,
I am interested to be a part of KoCLIP team. I am Korean and will try my best to contribute to this project as possible as i can. It would be nice if i could participate in this project and discuss about it. I can work in any time zone that is comfortable for everyone in the team.

1 Like

Hi @amphora , @kyungeun added you to the team :slight_smile: