Finetuned Donut model taking too much time on local machine for inference , around 5 minutes

shubh1608 · April 21, 2023, 11:28am

Finetuned Donut model is taking 4 minutes 37 seconds for inference on my local windows laptop which has 16GB RAM and 4 cores. However, inference time is under 5 seconds on a google colab CPU machine, it has 32GB RAM. On Colab GPU, the inference time is under a second.

Why it’s taking too much time on my local Windows machine? seems like it’s not a normal behavior. Could someone help and guide me on what could be wrong here?

I am using Transformers Version: 4.28.1, it’s the same on my windows machine as well.

Also, below is the prediction function which I am using and it’s the model.generate method which it taking time.

def run_prediction(image):
    pixel_values = processor(image, return_tensors="pt").pixel_values
    outputs = model.generate(
        pixel_values.to(device),
        decoder_input_ids=decoder_input_ids.to(device),
        max_length=model.decoder.config.max_position_embeddings,
        early_stopping=True,
        pad_token_id=processor.tokenizer.pad_token_id,
        eos_token_id=processor.tokenizer.eos_token_id,
        use_cache=True,
        num_beams=1,
        bad_words_ids=[[processor.tokenizer.unk_token_id]],
        return_dict_in_generate=True)
    
    sequence = processor.batch_decode(outputs.sequences)[0]
    sequence = sequence.replace(processor.tokenizer.eos_token, "").replace(processor.tokenizer.pad_token, "")
    sequence = re.sub(r"<.*?>", "", sequence, count=1).strip()  # remove first task start token
    return processor.token2json(sequence)

nielsr · April 24, 2023, 9:55am

I’d first debug whether it’s due to the generate() method or due to the token2json method (you can leverage the time module of Python for that).

shubh1608 · May 11, 2023, 12:04pm

I have checked, it’s due to generate() method.

Bikash · January 4, 2024, 11:34pm

Hi @shubh1608 : did you find the cause/fix to this problem?

Topic		Replies	Views
Is there way to convert the Donut model to openvino format 🤗Transformers	0	169	September 28, 2023
Different model performance after saving and loading Donut model 🤗Transformers	1	354	July 6, 2024
Finetuned model takes double inference time 🤗Transformers	0	328	March 2, 2023
Inference time gets slower as dataset size increase 🤗Transformers	0	432	February 23, 2023
High variability of CPU inference times Beginners	4	46	January 30, 2025

Finetuned Donut model taking too much time on local machine for inference , around 5 minutes

Related topics