Error while Trying to run inference using gaudi on a finetuned llama2 model using habana repo

gildesh · August 16, 2023, 7:36am

Using this repo here

created a container image using the dockerfile mentioned in the instructions
ran it on 2 computes : 1 HPU, 16 CPU and 32 GB Memory and 1 HPU, 50 CPU and 200 GB memory

regisss · August 16, 2023, 7:50am

Hi @gildesh, could you share the command you used to run inference please?

gildesh · August 16, 2023, 8:44am

thanks for replying!

running it as an endpoint with these params: -

#model_config={“model_name”: “/root/.cache/intel/neural-chat-7b-v2”, “tokenizer_name”: “/root/.cache/intel/llama/neural-chat-7b-v2”, “device”: “hpu”, “use_hpu_graphs”: true, “peft_path”:“/input/finetune/output/peft_model”}

regisss · August 16, 2023, 10:45am

Could you share the generate.py file that is used in this endpoint?

gildesh · August 16, 2023, 11:10am

github.com

intel/intel-extension-for-transformers/blob/main/workflows/chatbot/inference/generate.py

import argparse
import copy, time
from datetime import datetime
import torch
import re, os, logging
from threading import Thread
import contextlib
from typing import List
from transformers import (
    GenerationConfig,
    AutoModelForCausalLM,
    AutoModelForSeq2SeqLM,
    AutoTokenizer,
    AutoConfig,
    TextIteratorStreamer,
    StoppingCriteriaList,
    StoppingCriteria,
)

# Set necessary env variables

This file has been truncated. show original

regisss · August 16, 2023, 12:35pm

Thanks!

I don’t have access to intel/neural-chat-7b-v2, it’s not on the Hugging Face Hub it seems. Do you have a config.json file somewhere? If yes, could you tell me the value of the field model_type please?
For instance, for Intel/neural-chat-7b-v1-1, I see that the model is based on MPT: config.json · Intel/neural-chat-7b-v1-1 at main

gildesh · August 18, 2023, 10:19am

_name_or_path": “/models/llama-v2-latest-20230719/models_hf/Llama-2-7b”,
“architectures”: [
“LlamaForCausalLM”
],
“bos_token_id”: 1,
“eos_token_id”: 2,
“hidden_act”: “silu”,
“hidden_size”: 4096,
“initializer_range”: 0.02,
“intermediate_size”: 11008,
“max_position_embeddings”: 2048,
“model_type”: “llama”,

regisss · August 18, 2023, 12:57pm

And which version of Optimum Habana do you use?

gildesh · August 21, 2023, 4:26am

actually, we probably just use latest as we source it from this repo

just give this in requirements.txt
optimum

regisss · August 21, 2023, 6:54am

Could show me the output of pip show optimum-habana please?

Topic		Replies	Views
Trying the inference with model Llama-2-70b-hf on 2 A100 (80g) GPUs but getting errors Beginners	6	6614	November 28, 2023
Getting an error when deploying llama2 7B on custom dataset using sagemaker inference endpoint Beginners	0	178	March 18, 2024
Code makes inference with "Llama 3 70b instruct" model on CPU but has problem with inference with GPUs Beginners	0	1349	April 28, 2024
Deploying custom inference script with llama2 finetuned model Amazon SageMaker	6	1241	January 4, 2024
Unable to load a FineTuned LLama Model to GPU for inference Beginners	3	2974	December 15, 2023

Error while Trying to run inference using gaudi on a finetuned llama2 model using habana repo

Related topics