Pre-train RoBERTa from Scratch for Georgian Language
Currently, there are no open-source language models for the Georgian language. I have not so large dataset which I want to use for pre-training RoBERTa for the Georgian language from scratch.
2. Language
The model will be trained in Georgian Language
3. Model
RoBERTa
4. Datasets
wikipedia dump
Common Crawl dump
random web scraps
5. Training scripts
There are already Flax scripts to pre-train RoBERTa that we can easily use:
transformers/examples/flax/language-modeling at master · huggingface/transformers · GitHub)