Inference slower after fine tuning

jlamon · February 14, 2024, 12:25pm

Dear,

Before fine tuning the model, it takes 2 minutes to get the predictions on the dataset. However, after training, it takes around 12 minutes (same test set).

Does anybody have a reason to this issue ? Setting the model to cuda gives an error.

Notes: I’m using SFTTrainer and QLoRA.

I already thank you for your answer.

dblakely · February 14, 2024, 5:40pm

Does the finetuned model write a lot more than the base model?

jlamon · February 14, 2024, 5:54pm

Hey, no it does not.

I’m creating a classification task through prompts. Hence it only returns a few words (being the label’s name).

Topic		Replies	Views
Inference time gets slower as dataset size increase 🤗Transformers	0	432	February 23, 2023
SFTTrainer training very slow on GPU. Is this training speed expected? 🤗Transformers	4	296	February 8, 2025
Finetuned Donut model taking too much time on local machine for inference , around 5 minutes 🤗Transformers	3	951	January 4, 2024
Finetuning a Large Language Model Intermediate	0	83	October 23, 2024
Finetuned model takes double inference time 🤗Transformers	0	328	March 2, 2023

Inference slower after fine tuning

Related topics