Hello, I’m trying to generate a new pre-training for Donut model using Romanian language documents. I have around 100k scanned documents, including a metadata.jsonl formated as the one that synthdog generates: {"file_name": "img_1.jpg", "ground_truth": "{\"gt_parse\": {\"text_sequence\": \Text in…

Donut base-sized model, pre-trained only for a new language tutorial

Inesence February 17, 2023, 10:59am 2

Topic		Replies	Views
Donut Pre-Train on new Language 🤗Transformers	4	2276	July 1, 2025
Finetune Donut with new tokenizer Intermediate	6	2564	October 10, 2023
Donut fine tuning question 🤗Optimum	0	1620	October 16, 2023
Creating custom Donut model Models	0	712	March 16, 2023
Different model performance after saving and loading Donut model 🤗Transformers	1	348	July 6, 2024