Hey, I finetuned the LLama Model using PEFT and QLoRA, and load the model as follows:
PEFT_MODEL = "/kaggle/working/trained-model"
config = PeftConfig.from_pretrained(PEFT_MODEL)
model = AutoModelForCausalLM.from_pretrained(
config.base_model_name_or_path,
return_dict=True,
quantization_config=bnb_config,
device_map="auto",
trust_remote_code=True
)
tokenizer=AutoTokenizer.from_pretrained(config.base_model_name_or_path)
tokenizer.pad_token = tokenizer.eos_token
model = PeftModel.from_pretrained(model, PEFT_MODEL)
I want to use this finetuned model for my RAG pipeline that uses llama index. The issue here is that the functions of llama index need the model to be loaded using:
from llama_index.llms.huggingface import HuggingFaceLLM
from llama_index.core.prompts import PromptTemplate
llm = HuggingFaceLLM(
model_name=model_loc,
tokenizer_name="meta-llama/Llama-2-7b-chat-hf",
#query_wrapper_prompt=PromptTemplate("<|system|>Please check if the following pieces of context has any mention of the keywords provided in the question.If not ten say that you do not know the answer.Please do not make up your own answer.</s>\n<|user|>\nQuestion:{query_str}</s>\n<|assistant|>\n"),
# query_wrapper_prompt=PromptTemplate(template),
context_window=4096,
max_new_tokens=512,
# model_kwargs={'trust_remote_code':True},
# model_kwargs={"n_gpu_layers": -1},
generate_kwargs={"temperature": 0.5},
device_map="auto",)
Otherwise I will get the issue (AttributeError: ‘LlamaForCausalLM’ object has no attribute ‘predict’) if I run the following code:
response = sentence_query_engine.query(query)
Is there a way to address this? (Like maybe converting the CausalLM class to HuggingFaceLLM class) Also if there isn’t, can you suggest me how can I finetune the model for RAG so that I can load it using HuggingFaceLLM class and use it in my code base using LLamaIndex?