Inference after QLoRA fine-tuning

I’ve fine-tuned a model via QLoRA by following this notebook: Google Colab
And I pushed the adapter weights to the hugging face hub.
When it comes time to predict with the base model+adapters, should I quantize the base model again (given the adapters were trained alongside a frozen quantized base model)?
Or is it valid to load the base model unquantized, attach/merge the adapters as usual, and predict away?


## TRAINING
bnb_config = BitsAndBytesConfig(...)
model = AutoModelForCausalLM.from_pretrained(..., quantization_config=bnb_config)  # fit adapters alongside quantized base model
model = prepare_model_for_kbit_training(model)
peft_config = LoraConfig(...)
trainer = SFTTrainer(model=model, peft_config=peft_config, ...)
trainer.train()
trainer.push_to_hub()  # pushes adapter weights only

## INFERENCE
peft_config = PeftConfig.from_pretrained(<hub_id>)
base_model = AutoModelForCausalLM.from_pretrained(peft_config.base_model_name_or_path, device_map="auto")  # should I quantize here as I did when fitting the adapters with QLoRA?
peft_model = PeftModel.from_pretrained(base_model, <hub_id>)
model = peft_model.merge_and_unload()