How can I keep use of the base model version for inference after fine-tuning

Ammina · May 11, 2024, 2:18pm

Hi, I have used Transformers pipeline to fine-tune a llama 2 for a chatbot .
Used the below code to create the model and tokenizer:

tokenizer= autotokenizer.from_pretrained(model_path)
hf_pipeline_obj = hf_pipeline("text-generation",model=model_path,tokenizer=tokenizer,torch_dtype=torch.bfloat16,base_model_inference=true,trust_remote_code=true,device_map="auto",max_length=1000,do_sample=true,top_k=10,num_return_sequences=1,eos_token_id=tokenizer.eos_token_id,)
llm = huggingfacepipeline(pipeline=hf_pipeline_obj, model_kwargs={'temperature': 0.7})

But the problem is when asking a general question to the created chatbot it does not reply correctly after fine-tuning so how can I keep the base model for inference even after fine-tuning?
Can you explain that please.

Thank you,

nielsr · May 12, 2024, 8:03am

When you fine-tuned the model using LoRa, you can use the model.disable_adapters() method as explained here: Load adapters with 🤗 PEFT

Topic		Replies	Views
Inference Issue with Llama Models using HF Inference Beginners	1	30	February 6, 2025
How to use the model resulted from PEFT for inference Beginners	2	1053	June 2, 2024
How to use gated model in inference Beginners	3	269	September 27, 2024
Inference workflow in compile mode using transformers.pipeline() 🤗Transformers	0	32	August 26, 2024
Langchain not changing pipeline's model to Llama-2-7b-hf 🤗Transformers	1	1447	September 2, 2023

How can I keep use of the base model version for inference after fine-tuning

Related topics