Using LoRA Adapters

Hello,
for fine tuning LLama 2-7B with the LoraConfig I quantized it in int8.
After that I saved the adapters.

Now for the inference, I have to load the base model and load the adapters in it, like:

model_id = “meta-llama/Llama-2-7b-chat-hf”
model = AutoModelForCausalLM.from_pretrained(
model_id,
device_map=“auto”,
cache_dir=“/mnt/video/llm_models/”,
torch_dtype=torch.float16)

model = PeftModel.from_pretrained(model, path_to_the_adpaters)

Should I use the basic model LLama 2 for the inference, as above, or the quantized one as below, with which I created the adapters with fine tuning

model_id = “meta-llama/Llama-2-7b-chat-hf”
model = AutoModelForCausalLM.from_pretrained(
model_id,
device_map=“auto”,
cache_dir=“/mnt/video/llm_models/”,
load_in_8bit=True)

model = PeftModel.from_pretrained(model, path_to_the_adpaters)