ValueError: Could not load model /opt/ml/model with any of the following classes: (<class 'transformers.models.auto.modeling_auto.AutoModelForCausalLM'>, <class 'transformers.models.llama.modeling_llama.LlamaForCausalLM'>)

Griffa369 · March 13, 2024, 1:54pm

I have a finetuned LLAMA 2 model which I am trying to deploy of sagemaker using sagemaker.hugging_face. I have tried deploying a model finetuned elsewhere as well as deploying a model finetuned on sagemkaer endpoint. From both of which I am getting the below error :
Error:
‘ValueError: Could not load model /opt/ml/model with any of the following classes: (<class ‘transformers.models.auto.modeling_auto.AutoModelForCausalLM’>, <class ‘transformers.models.llama.modeling_llama.LlamaForCausalLM’>).’

Below is my structure of model.tar.tz

├── code
│ ├── inference.py
│ └── requirements.txt
├── config.json
├── generation_config.json
├── model-00007-of-00007.safetensors
├── model.safetensors.index.json
├── special_tokens_map.json
├── tokenizer_config.json
├── tokenizer.json
└── tokenizer.model
├── special_tokens_map
├── adapter_config.json
└── adapter_model.bin

I have finetuned a LLAMA-2-7b Chat using transformers.Trainer and PEFT library .
I am trying to deploy it on Sagemaker endpoint. Below is my code for deployment.

config = {
‘HF_TASK’:‘text-generation’,
}

huggingface_model = HuggingFaceModel
(
model_data=‘s3://…/output/model.tar.gz’,
role=role,
transformers_version=“4.28”,
pytorch_version=“2.0”,
py_version=“py310”,
model_server_workers=1,
env=config,
)

predictor = huggingface_model.deploy(
initial_instance_count=1,
instance_type= “ml.g5.4xlarge”,
container_startup_health_check_timeout=600,
)

payload ={
“inputs”: prompt,
“parameters”:
{
“do_sample”: True,
“top_p”:0.7,
“temperature”:0.3,
“top_k”:50,
“max_new_tokens”:50,
“repetition_penalty”:1.03
}
}

predictor.predict(payload)

I am receiving the following/above error while running predict .

Error:
‘ValueError: Could not load model /opt/ml/model with any of the following classes: (<class ‘transformers.models.auto.modeling_auto.AutoModelForCausalLM’>, <class ‘transformers.models.llama.modeling_llama.LlamaForCausalLM’>).’

Now, I have tried multiple approaches .

Approach 1: Finetuning the model using AWS sagemaker endpoint (HuggingFace estimator) . Adding the saved model.tar.gz file path(estimator.model_data) .to model_data=huggingface_estimator.model_data. (No inference.py file, i.e. default inference code provided by sagemaker.hugging_face)

Approach 2: Finetuning the model locally > uploading to s3 bucket and adding the model.tar.gz file path to model_data=“s3://…/model.tar.tz” (The directory contains code/inference.py of which i have given the structure above.)

In both the cases I am facing the same issue/error as above…Now here is my inference.py file and requirements.txt file from Approach 2.

######## inference.py

def model_fn (model_dir):

  compute_dtype = getattr(torch, "float16")
  tokenizer = AutoTokenizer.from_pretrained(model_dir)
  
  quant_config = transformers.BitsAndBytesConfig(
      load_in_4bit=True,
      bnb_4bit_quant_type="nf4",
      bnb_4bit_compute_dtype=compute_dtype,
      bnb_4bit_use_double_quant=False,
  )

  base_model = AutoModelForCausalLM.from_pretrained(
                                                  model_dir,
                                                  quantization_config=quant_config,
                                                  device_map={"": 0}
                                                  )

  return base_model, tokenizer

def input_fn(input_data, content_type):

  print(input_data)
  print(content_type)

  sentences = decoder_encoder.decode(input_data, content_type)

  print(sentences)

  return sentences

def predict_fn(data, model):

  model, tokenizer  = model
  input_ids = tokenizer.encode(data,  padding="max_length", truncation=True, return_tensors="pt")
  
  summary_ids = model.generate(input_ids=input_ids.cuda(), max_length=100, num_beams=4, do_sample=True)
  
  output = tokenizer.decode(summary_ids[0])
      
  
  return {'vectors': output}

def output_fn(prediction, accept):

  response = decoder_encoder.encode(prediction, accept)

  print(response)

  return response

######## requirements.txt

accelerate==0.16.0

transformers==4.26.0

bitsandbytes==0.37.0

Now, I read somewhere that this issue might be related to Tensorflow and using transformers.pipeline() might solve the issue .

If that is the case can someone please help me on how my inference.py file should look like or if any other file should be present in my model/code folder if I am missing any.???

OR

If I have written the inference.py file wrong. Can you help me on how should it look like.???

OR

If the error is coming from some other areas ???

Thanks in advance for the help!!!

Topic		Replies	Views
Error loading finetuned llama2 model while running inference Amazon SageMaker	27	4823	September 20, 2023
Getting error in the inference stage of Transformers Model (Hugging Face) 🤗Transformers	0	784	October 11, 2022
Inference failed for FLAN-UL2(20B) on SageMaker Amazon SageMaker	6	2187	April 4, 2023
Inference issue with fine tuned model Amazon SageMaker	2	287	April 7, 2024
InternalServerException when running a model loaded on S3 Amazon SageMaker	4	995	August 6, 2021

ValueError: Could not load model /opt/ml/model with any of the following classes: (<class 'transformers.models.auto.modeling_auto.AutoModelForCausalLM'>, <class 'transformers.models.llama.modeling_llama.LlamaForCausalLM'>)

Error: ‘ValueError: Could not load model /opt/ml/model with any of the following classes: (<class ‘transformers.models.auto.modeling_auto.AutoModelForCausalLM’>, <class ‘transformers.models.llama.modeling_llama.LlamaForCausalLM’>).’

Related topics

Error:
‘ValueError: Could not load model /opt/ml/model with any of the following classes: (<class ‘transformers.models.auto.modeling_auto.AutoModelForCausalLM’>, <class ‘transformers.models.llama.modeling_llama.LlamaForCausalLM’>).’