Building on top of Korean Language models that are publicly available, we want to train a multimodal generative system. Specifically, we train CLIP on Korean datasets using KoBERT and ViT as backbones.
Model
ViT, KoBERT (or any other Korean encoder LM)
Datasets
KETI has released a Korean image captioning dataset, available on AI Hub. We can also utilize the WIT dataset, which is a multilingual dataset scraped from Wikipedia.
Training Script
The training script is (almost) already available, see here.
Challenges
KoBERT likely performs slightly worse than the default English BERT. While this could be a bottleneck, we can always write the script in such a way that it is easy to plug-and-play different LMs. This way, when a better Korean LM is released, it can easily be used.
There might be minor architectural adjustments we have to make (e.g. adding projection layers).
Attaining fluency in JAX will take time and effort.
Desired Outcomes
The final deliverable of this project will most likely be an open source repository, accompanying documentation, model weights, and potentially a demo Streamlit app.
Thanks everyone! Please feel free to share any thoughts you have regarding the direction or details of this project. Looking forward to the next couple of weeks.
Agreed. I mean I don’t think timezone should ever prevent anyone from joining, but for the purposes of arranging logistics, it would certainly be helpful. I’m also on KST/GMT+9.
Hello @jaketae & Team,
I am interested to be a part of such an amazing project & team. I will try my best to contribute to the Korean version of the CLIP model. It would be nice if we could discuss some learning resources that would be useful for this project. I can work in any time zone that is comfortable for everyone in the team.
IMHO aligning Korean encoders with the pre-trained CLIP text encoder will probably suffice. It would be great if we could do better. Count me in (I’m on GMT+9)
Hi @jaketae & Team,
I am interested to be a part of KoCLIP team. I am Korean and will try my best to contribute to this project as possible as i can. It would be nice if i could participate in this project and discuss about it. I can work in any time zone that is comfortable for everyone in the team.