Inference slower after fine tuning


Before fine tuning the model, it takes 2 minutes to get the predictions on the dataset. However, after training, it takes around 12 minutes (same test set).

Does anybody have a reason to this issue ? Setting the model to cuda gives an error.

Notes: I’m using SFTTrainer and QLoRA.

I already thank you for your answer.

Does the finetuned model write a lot more than the base model?

Hey, no it does not.

I’m creating a classification task through prompts. Hence it only returns a few words (being the label’s name).