Creating custom Donut model

YanaS · March 16, 2023, 5:55am

I have a document understanding task and created a dataset containing the images and their ground truth. The documents are in the Bulgarian language. I have tested the donut model, but I don’t have enough images to fine-tune it to understand the Bulgarian language.
After 15 epochs and a little over 600 images (not to mention the time it took to train - >10 hours) I got the “impressive” mean accuracy of: 0.01549120881489085

I have read about VisionEncoderDecoderModels and my impression is that I can use it to create a custom version of Donut, like using Swin model for the encoder and Bert (instead of Bart) for the decoder. The idea is that the Bert model has versions trained in my language. Is my understanding correct? Moreover, I watched a tutorial about working with VisionEncoderDecoderModels but would appreciate more insights. Specifically, I am not sure how to deal with the processor, which processor to add, do I need to do some other steps before initializing it.

Topic		Replies	Views
Use only encoder to generate the image embeddings in a VisionEncoderDecoderModel such as Donut Models	1	794	February 6, 2024
Donut base-sized model, pre-trained only for a new language tutorial Models	2	1047	February 19, 2023
Donut fine tuning question 🤗Optimum	0	1629	October 16, 2023
Different model performance after saving and loading Donut model 🤗Transformers	1	352	July 6, 2024
[DONUT] Typo errors - Document parsing 🤗Transformers	1	520	September 10, 2024

Creating custom Donut model

Related topics