Loaded adapter seems ignored

Hello everyone, I’m an absolute beginner, trying to do some cool stuff.

I have managed to fine-tune a model using QLoRA, used 'NousResearch/Llama-2-7b-chat-hf' as the base model, and created my own dataset, as a .jsonl, with the format {"text": "<s>[INST] Generate programming exercise of type xxx. [/INST] A programming exercise...</s>"}, with trainer.model.save_pretrained(path_adapters, safe_serialization=True) got the adapter.

I’m trying to load the original model 'NousResearch/Llama-2-7b-chat-hf' and the adapters and run a query locally, but the results look like they are not taking into consideration the adapter/fine-tuning.

from peft import PeftModelForCausalLM
from transformers import AutoModelForCausalLM, AutoTokenizer

base_model_path = r'C:\MODEL\Llama-2-7b-chat-hf'
adapter_path = r'C:\MODEL\adapters'
adapter_name = 'adapter_1'
device = 'cpu'

tokenizer = AutoTokenizer.from_pretrained(base_model_path)
tokenizer.pad_token = tokenizer.eos_token

base_model = AutoModelForCausalLM.from_pretrained(
base_model.config.use_cache = False

adapted_model = PeftModelForCausalLM.from_pretrained(
# adapted_model.load_adapter(adapter_path, adapter_name=adapter_name)
# adapted_model.set_adapter(adapter_name)
# adapted_model.enable_adapters()

prompt = 'Generate programming exercise of type xxx.'
input_ids = tokenizer(prompt, return_tensors='pt', truncation=True).input_ids
outputs = adapted_model.generate(input_ids=input_ids, max_new_tokens=512, do_sample=False)
output = tokenizer.decode(outputs[0], skip_special_tokens=True)

Keep in mind that I’m trying to run this with both the model and adapters locally using the CPU.

Calling enable_adapters breaks, and not sure if/when to use it, I’ll eventually have several adapters and switch between them, but for now, I want to see results with one.

Thanks in advance.