Hello everyone, I’m an absolute beginner, trying to do some cool stuff.
I have managed to fine-tune a model using QLoRA, used 'NousResearch/Llama-2-7b-chat-hf'
as the base model, and created my own dataset, as a .jsonl
, with the format {"text": "<s>[INST] Generate programming exercise of type xxx. [/INST] A programming exercise...</s>"}
, with trainer.model.save_pretrained(path_adapters, safe_serialization=True)
got the adapter.
I’m trying to load the original model 'NousResearch/Llama-2-7b-chat-hf'
and the adapters and run a query locally, but the results look like they are not taking into consideration the adapter/fine-tuning.
from peft import PeftModelForCausalLM
from transformers import AutoModelForCausalLM, AutoTokenizer
base_model_path = r'C:\MODEL\Llama-2-7b-chat-hf'
adapter_path = r'C:\MODEL\adapters'
adapter_name = 'adapter_1'
device = 'cpu'
tokenizer = AutoTokenizer.from_pretrained(base_model_path)
tokenizer.pad_token = tokenizer.eos_token
base_model = AutoModelForCausalLM.from_pretrained(
base_model_path,
device_map=device,
low_cpu_mem_usage=True,
)
base_model.config.use_cache = False
adapted_model = PeftModelForCausalLM.from_pretrained(
base_model,
adapter_path,
adapter_name=adapter_name,
device_map=device
)
print(adapted_model.active_adapter)
# adapted_model.load_adapter(adapter_path, adapter_name=adapter_name)
# adapted_model.set_adapter(adapter_name)
# adapted_model.enable_adapters()
prompt = 'Generate programming exercise of type xxx.'
input_ids = tokenizer(prompt, return_tensors='pt', truncation=True).input_ids
outputs = adapted_model.generate(input_ids=input_ids, max_new_tokens=512, do_sample=False)
output = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(output)
Keep in mind that I’m trying to run this with both the model and adapters locally using the CPU.
Calling enable_adapters
breaks, and not sure if/when to use it, I’ll eventually have several adapters and switch between them, but for now, I want to see results with one.
Thanks in advance.