I asked my teacher to fine-tune the Llama-3-8B-Instruct model. After running, my teacher sent me a folder containing .json files as shown in the picture.
How do I use these files?
Hi,
As your screenshot shows a folder containing an “adapter_model.safetensors” file, it means that your teacher used the LoRa method to fine-tune Llama-3-8B-Instruct. This means that linear layers (called adapters) are trained on top of the pre-trained (and frozen) Llama-3-8B-Instruct model. This method is what we call a parameter-efficient fine-tuning (or PEFT for short) method, as it is a lot more memory friendly compared to full fine-tuning, where you would update all the parameters of the model.
To perform inference, one can do the following:
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("path_to_your_folder")
model = AutoModelForCausalLM.from_pretrained("path_to_your_folder")
# prepare a prompt for the model
messages = [
{"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
{"role": "user", "content": "Who are you?"},
]
input_ids = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
return_tensors="pt"
).to(model.device)
# autoregressively generate completion
terminators = [
tokenizer.eos_token_id,
tokenizer.convert_tokens_to_ids("<|eot_id|>")
]
outputs = model.generate(
input_ids,
max_new_tokens=256,
eos_token_id=terminators,
do_sample=True,
temperature=0.6,
top_p=0.9,
)
response = outputs[0][input_ids.shape[-1]:]
print(tokenizer.decode(response, skip_special_tokens=True))
The AutoModelForCausalLM
class will automatically load the base model (Llama-3-8B-Instruct in your case) as well as the adapters, thanks to the PEFT integration in Transformers. One can optionally merge these adapters into the base model by calling the merge_and_unload()
method as also explained in that guide.
This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.