Using Trainer class + 4/8 bit quantised model for prediction

Hi all,

I would like to run trainer.predict for generation with e.g. mistralai/Mistral-7B-v0.1.

The trainer is initialised like

trainer = Seq2SeqTrainer(model,
                      args=training_args
                      ...
                      )

However, when I want to initialise the trainer with a model in 4 or 8 bit, I get following error:

ValueError: You cannot perform fine-tuning on purely quantized models. Please attach trainable adapters on top of the quantized model to correctly perform fine-tuning. Please see: Load adapters with 🤗 PEFT for more details

Well, I don’t want to perform fine-tuning, just prediction.

Any smart way to walk around this error?

Thanks for your help!

Cheers,
Stephan