Ancient Greek transformers

Jacobo · June 20, 2021, 5:29pm

Hello,

I’m considering pretraining a model based on an ancient Greek corpus. So far, I have not be able to find any transformer model of this language. So, I would like to know if somebody is already working on this before I start.

Second, I would like to know if it makes sense to pretrain an ancient Greek (grc) model given the fact that ancient Greek corpora are very small in comparison to modern languages corpora. The largest available ancient Greek corpus is about 500 MB, there is no ancient Greek Wikipedia articles, and google books scans are unreliable because of the many diacritics used in grc.

Is there any particular way to approach languages with small datasets? I’m also interested in other ancient languages with small historical registers, thinking that transformers could be used for text restoration (filling gaps in broken texts)

Thanks,

Jacobo

Topic		Replies	Views
Pre-trained Model for Text Translation 🤗Transformers	0	467	June 30, 2022
Question Answering model on mathematical domain for the greek language Research	0	813	February 1, 2022
Restric vocab size of pretrained transformer Beginners	0	249	October 5, 2022
Fine-tune, or train from scratch? Beginners	6	3451	September 16, 2020
Seeking smart french-speaking model 10/30B to search large mythological texts Models	3	30	March 16, 2025

Ancient Greek transformers

Related topics