Donut fine tuning question

tsegaran · October 16, 2023, 3:20pm

Hi,

I have followed Document AI: Fine-tuning Donut for document-parsing using Hugging Face Transformers (philschmid.de) setup on fine tuning donut on custom data set. I am using a csv file to input that looks like this:

image_path,ground_truth
AccessCode.png,“{”“gt_parse”“:{”“roll_number”“:”“050 065 14020 0000"”,““tax_year””:““REPRINT-2014"”,”“tax_amount”“:”“7784"”,““tax_due_date””:“”\u2018Aug. 14, 2014"“,”“property_address”“:”“1234 FRANCIS ST PLAN 1274 LOT 15 NRSFR”“,”“municipality”“:”“City of CAMBRIDGE”“,”“borrower_name”“:”“DOE JOHN DOE JANE”“}}”

I managed to train the model. But for sample purpose i was only using 9 training data rows.I tried to run the code to test this, but i’m only getting a empty response back. Is this due to my low data set used for training?

from transformers import DonutProcessor, VisionEncoderDecoderModel
import re
import torch
from PIL import Image

model = 'C:/ocr/model_trained_donut'
fileName = 'C:/ocr/AccessCode.png'

processor = DonutProcessor.from_pretrained(model)
model = VisionEncoderDecoderModel.from_pretrained(model)

image = Image.open(fileName)
pixel_values = processor(image, return_tensors="pt").pixel_values
print(pixel_values.shape)

task_prompt = "<s>"
decoder_input_ids = processor.tokenizer(task_prompt, add_special_tokens=False, return_tensors="pt")["input_ids"]

device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device)

outputs = model.generate(pixel_values.to(device),
                               decoder_input_ids=decoder_input_ids.to(device),
                               max_length=model.decoder.config.max_position_embeddings,
                               early_stopping=False,
                               pad_token_id=processor.tokenizer.pad_token_id,
                               eos_token_id=processor.tokenizer.eos_token_id,
                               use_cache=True,
                               num_beams=1,
                               bad_words_ids=[[processor.tokenizer.unk_token_id]],
                               return_dict_in_generate=True,
                               output_scores=True,)
							   

sequence = processor.batch_decode(outputs.sequences)[0]
sequence = sequence.replace(processor.tokenizer.eos_token, "").replace(processor.tokenizer.pad_token, "")
sequence = re.sub(r"<.*?>", "", sequence, count=1).strip()  # remove first task start token
print(sequence)							   

print(processor.token2json(sequence))

output

Could not find image processor class in the image processor config or the model config. Loading based on pattern matching with the model's feature extractor configuration.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
model loaded
C:\Users\thaban.segaran\AppData\Roaming\Python\Python39\site-packages\transformers\generation\utils.py:1421: UserWarning: You have modified the pretrained model configuration to control generation. This is a deprecated strategy to control generation and will be removed soon, in a future version. Please use and modify the model generation configuration (see https://huggingface.co/docs/transformers/generation_strategies#default-text-generation-configuration )
  warnings.warn(
C:\Users\thaban.segaran\AppData\Roaming\Python\Python39\site-packages\transformers\generation\configuration_utils.py:399: UserWarning: `num_beams` is set to 1. However, `early_stopping` is set to `True` -- this flag is only used in beam-based generation modes. You should set `num_beams>1` or unset `early_stopping`.
  warnings.warn(
Reference:

Prediction:
 {'text_sequence': '<s></s>'}

Topic		Replies	Views
Donut Pre-Train on new Language 🤗Transformers	4	2299	July 1, 2025
Finetuning Donut Transformer on DocParsing Beginners	0	857	October 23, 2023
Finetune Donut with new tokenizer Intermediate	6	2594	October 10, 2023
Donut base-sized model, pre-trained only for a new language tutorial Models	2	1049	February 19, 2023
[SOLVED] DONUT Fine-tuning error, following documentation Beginners	0	129	June 24, 2024

Donut fine tuning question

Related topics