How to use the model resulted from PEFT for inference

Smarneh · May 9, 2024, 12:11pm

Hello everyone,

I followed this tutorial to fine-tune code-LLAMA on my dataset. I added

trainer.push_to_hub()

to push the model to HF (private model). When I use the code below for inference, it works


output_dir = "xxx"
base_model = "codellama/CodeLlama-7b-hf"
model = AutoModelForCausalLM.from_pretrained(
    base_model,
    #load_in_8bit=True,
    torch_dtype=torch.float16,
    device_map="auto",
)
#test
tokenizer = AutoTokenizer.from_pretrained("codellama/CodeLlama-7b-hf")
model = PeftModel.from_pretrained(model, output_dir)
# model.config.use_cache =True
validation = json.load(open("data/Validaton.json"))
output_list = list()
for item in validation:
    prompt = format_prompt(item)
    model_input = tokenizer(prompt, return_tensors="pt").to("cuda")
    model.eval()

    with torch.no_grad():
        completion = tokenizer.decode(model.generate(**model_input, max_new_tokens=100)[0], skip_special_tokens=True)
        # print(type(completion))
        output_list.append((item["question"], completion))

but I need to be able to use the model for inference from another device that does not have the content stored in the output_directory.

So I tried to do the following:


base_model ="Smarneh/XXXX"
tokenizer = AutoTokenizer.from_pretrained("Smarneh/XXXX", use_auth_token=True)

model = AutoModelForCausalLM.from_pretrained(
    base_model, use_auth_token=True)

for item in validation:
    prompt = format_prompt(item)
    model_input = tokenizer(prompt, return_tensors="pt").to("cuda")
    model.eval()

    with torch.no_grad():
        completion = tokenizer.decode(model.generate(**model_input, max_new_tokens=100)[0], skip_special_tokens=True)
        # print(type(completion))
        output_list.append((item["question"], completion))

and I got the following error:

    raise EnvironmentError(
OSError: Smarneh/XXX does not appear to have a file named config.json. Checkout 'https://huggingface.co/Smarneh/XXX/main' for available files.

I googled the error message, but I could find a case similar to mine. I would appreciate any help with this.

Thanks a lot in advance

Smarneh · June 2, 2024, 2:47pm

After searching in other topics in this forum, I found this Using PEFT at Hugging Face
I followed it and the problem was solved.

system · June 3, 2024, 2:47am

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Loading and using Autotrain model error 🤗AutoTrain	0	641	November 28, 2023
How to figure out corresponding arguments in PeftModel? Models	7	1069	February 16, 2024
Llama model outputs strange words Beginners	0	133	December 1, 2024
Unable to load a FineTuned LLama Model to GPU for inference Beginners	3	2977	December 15, 2023
AutoModelForCausalLM() to HuggingFaceLLM Beginners	2	2983	October 4, 2024

How to use the model resulted from PEFT for inference

Related topics