Currently, there are no open-source language models for the Georgian language. I have not so large dataset which I want to use for pre-training RoBERTa for the Georgian language from scratch.
The model will be trained in Georgian Language
Common Crawl dump
random web scraps
There are already Flax scripts to pre-train RoBERTa that we can easily use: