Getting ModelError when trying to interact with deployed fine-tuned (LoRA/PEFT) model via Amazon API Gateway and AWS Lambda

Hi everyone, I have been struggling for a quite a bit with a problem I’m having. So, to provide context, I fine-tuned a Distilbert model with LoRA (PEFT) to save on costs and also to explore this method (after learning about it recently and seeing some great content on it from HuggingFace). I fine-tuned the distilbert-base-uncased model with a HuggingFace estimator in Sagemaker for a sentiment analysis task and it works just fine there. I can get predictions from my endpoint no problem.

I then wanted to use this deployed model in a web app that takes in movie reviews, then communicated with my model to get predictions via Amazon API Gateway and AWS Lambda.

This is my Lambda function

import json
import boto3

def lambda_handler(event, context):
    # The SageMaker runtime is what allows us to invoke the endpoint that we've created.
    runtime = boto3.Session().client('sagemaker-runtime')

    # Put incoming review in the format the model expects and add truncation so that it can handle really long reviews
    response_body_param = {"inputs": event['body'],"parameters": {"truncation": True}}
    
    # enter the endpoint name
    endpoint_name = 'endpoint-huggingface-pytorch-inference-2023-07-20-01-45-57-792'
    
    try:
        # Now we use the SageMaker runtime to invoke our endpoint, sending the review we were given
        response = runtime.invoke_endpoint(EndpointName = endpoint_name,        # The name of the endpoint we created
                                           ContentType = 'application/json',    # The data format that is expected into the model
                                           Accept = 'application/json',          # The data format that is expected out of the model
                                           Body = json.dumps(response_body_param))          # The actual review (formatted to be input to model)
    
        # The response is an HTTP response whose body contains the result of our inference
        result_output = response['Body'].read().decode()
        
        result = json.loads(result_output)[0]
        #result = json.loads(response.content.decode("utf-8"))
        
        label_mapping_dict = {'LABEL_0':0, 'LABEL_1':1}
        result['label'] = label_mapping_dict[result['label']]
    
        return {
            'statusCode' : 200,
            'headers' : { 'Content-Type' : 'application/json', 'Access-Control-Allow-Origin' : '*' },
            'body' : str(result)
        }
    
    except Exception as e:
        print(repr(e))
        return {
            "statusCode": 500,
            "headers": {
                "Content-Type": "application/json",
                "Access-Control-Allow-Origin": "*",
                "Access-Control-Allow-Credentials": True,
                
            },
            "body": json.dumps({"error": repr(e)}),
            
        }

and when I test it e.g. with the review “The movie was sensational. I loved it.”, it works just fine as seen below:

Test Event Name
test_event

Response
{
  "statusCode": 200,
  "headers": {
    "Content-Type": "application/json",
    "Access-Control-Allow-Origin": "*"
  },
  "body": "{'label': 1, 'score': 0.9998412132263184}"
}

I also did a test on my API Gateway and it works as seen below:

However, when I am trying to send a review through from the website to the model and click submit, i get the error:

{"error": "ModelError('An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from primary with message \"{\\n  \"code\": 400,\\n  \"type\": \"InternalServerException\",\\n  \"message\": \"You need to specify either `text` or `text_target`.\"\\n}\\n\". See https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#logEventViewer:group=/aws/sagemaker/Endpoints/endpoint-huggingface-pytorch-inference-2023-07-20-01-45-57-792 in account 113878691707 for more information.')"}

Has anyone experienced this before? I would really appreciate the help, so I can understand where I’m going wrong here.

Also, see the website code below:

<!DOCTYPE html>
<html lang="en">
    <head>
        <title>Sentiment Analysis Web App</title>
        <meta charset="utf-8">
        <meta name="viewport"  content="width=device-width, initial-scale=1">
        <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/4.0.0/css/bootstrap.min.css" integrity="sha384-Gn5384xqQ1aoWXA+058RXPxPg6fy4IWvTNh0E263XmFcJlSAwiGgFAW/dAiS6JXm" crossorigin="anonymous">
        <link href='https://fonts.googleapis.com/css?family=Londrina+Shadow' rel='stylesheet' type='text/css'>

        <style>
          body, html {
        height: 100%;
    }

    .bg {
        /* The image used */
        background-image: url("https://images5.alphacoders.com/329/329544.jpg");

        /* Full height */
        height: 100%;

        /* Center and scale the image nicely */
        background-position: center;
        background-repeat: no-repeat;
        background-size: cover;
    }

    h1 {
  font-family: 'Londrina Shadow', cursive;
  text-align: center;
  font-size: 55px;
  color: Black;
}

    .color-it {
      /* The color of the text */
      color:rgba(255, 255, 255, 0.85)
    }
        </style>

        <script src="https://ajax.googleapis.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
        <script src="https://maxcdn.bootstrapcdn.com/bootstrap/4.0.0/js/bootstrap.min.js"></script>

        <script>
         "use strict";
         try {
          function submitForm(oFormElement) {
            var xhr = new XMLHttpRequest();
            xhr.onload = function() {
              var result = JSON.parse(xhr.responseText);
              var label = result.label;
              var score = result.score;
              var scorePercentage = (score * 100).toFixed(1);
              var resultElement = document.getElementById('result');
              if (label == 0) {
                  resultElement.className = 'bg-danger';
                  resultElement.innerHTML = 'Your review was NEGATIVE! (I am ' + scorePercentage +'% sure of this)';
              } else {
                  resultElement.className = 'bg-success';
                  resultElement.innerHTML = 'Your review was POSITIVE! (I am ' + scorePercentage +'% sure of this)';
              }
          }
          xhr.open (oFormElement.method, oFormElement.action, true);
          var review = document.getElementById('review');
          xhr.send (review.value);
          return false;
         }
          
         } catch (error) {
          console.error('Error occurred:', error)
         }
         
        </script>

    </head>
    <body>
      <div class="bg">
        <div class="container">
            <h1 id="title-color"><b>Is your movie review positive, or negative?</b></h1>
            <p class="color-it"><b>Enter your review let's find out...</b></p>
            <form method="POST"
                  action="https://uefbqdl3vj.execute-api.us-east-1.amazonaws.com/test"
                  onsubmit="return submitForm(this);" >               <!-- (Old) API not live for this post on HF -->
                <div class="form-group">
                    <label for="review" class="color-it"><b>Review:</b></label>
                    <textarea class="form-control"  rows="5" id="review">Please write your review here.</textarea>
                </div>
                <button type="submit" class="btn btn-default">Submit</button>
            </form>
            <h1 class="bg-success" id="result"></h1>
        </div>
      </div>
    </body>
</html>

(Dislaimer: I have deleted the endpoint that I shared in the code.)

Kind regards,
Henry

How did you deploy your model? It seems like your input is not correct.

Why don’t you start with a simple node.js script an a fetch call? Since the error most likely comes from xhr.open (oFormElement.method, oFormElement.action, true);not creating a correct JSON request.

1 Like

Hi Philipp,

Thanks for getting back to me on this! I suspected that the problem had something to do with the format of the JSON request but I was unsure how to debug/fix this. I’ll start by trying with a simple node.js script with fetch call like you recommended.

This is how I deployed the model:

from sagemaker.huggingface import HuggingFaceModel

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
   model_data=huggingface_estimator.model_data,
   role=role, 
   transformers_version="4.26", 
   pytorch_version="1.13", 
   py_version="py39",
   model_server_workers=1
)
# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
   initial_instance_count=1,
   instance_type= "ml.g4dn.2xlarge"
)

And this is how I trained it:

import os
import sys
import argparse
from transformers import (
    AutoModelForSequenceClassification,
    AutoTokenizer,
    set_seed,
    default_data_collator
)
from sklearn.metrics import (
    accuracy_score, precision_recall_fscore_support
)
from datasets import load_from_disk
import torch
from transformers import Trainer, TrainingArguments
from peft import PeftConfig, PeftModel
import shutil
import random
import logging

def parse_arge():
    """Parse the arguments."""
    parser = argparse.ArgumentParser()
    # add model id and dataset path argument
    parser.add_argument(
        "--model_id",
        type=str,
        default="distilbert-base-uncased",
        help="Model id to use for training.",
    )
    parser.add_argument(
        "--train_dataset_path",
        type=str,
        default="lm_fine_tuning_train_dataset",
        help="Path to fine-tuning training dataset.",
    )
    """
    parser.add_argument(
        "--test_dataset_path",
        type=str,
        default="lm_test_dataset",
        help="Path to test dataset.",
    )
    """
    # add training hyperparameters for epochs, batch size, learning rate, and seed
    parser.add_argument("--epochs", type=int, default=3, help="Number of epochs to train for.")
    parser.add_argument(
        "--per_device_train_batch_size",
        type=int,
        default=1,
        help="Batch size to use for training.",
    )
    parser.add_argument("--lr", type=float, default=5e-5, help="Learning rate to use for training.")
    parser.add_argument("--seed", type=int, default=42, help="Seed to use for training.")
    parser.add_argument(
        "--gradient_checkpointing",
        type=bool,
        default=True,
        help="Path to deepspeed config file.",
    )
    parser.add_argument(
        "--bf16",
        type=bool,
        default=True if torch.cuda.get_device_capability()[0] == 8 else False,
        help="Whether to use bf16.",
    )
    args = parser.parse_known_args()
    return args

def create_peft_config(model):
    from peft import (
        get_peft_model,
        LoraConfig,
        TaskType,
        prepare_model_for_int8_training,
    )

    peft_config = LoraConfig(
        task_type = TaskType.SEQ_CLS,
        #inference_mode=False,
        r=32, # #rank i.e. lora attention dimension
        lora_alpha=32, # The alpha parameter for Lora scaling
        target_modules=["q_lin", "v_lin"], # use this if you know the target modules you want for model
        lora_dropout=0.05, # The dropout probability for Lora layers
        bias="none",
    )
    
    #model = prepare_model_for_int8_training(model)
    model = get_peft_model(model, peft_config)
    model.print_trainable_parameters()
    return model

# compute metrics function for binary classification (custom metrics)
"""
def compute_metrics(pred):
    labels = pred.label_ids
    preds = pred.predictions.argmax(-1)
    precision, recall, f1, _ = precision_recall_fscore_support(labels, preds, average="binary")
    acc = accuracy_score(labels, preds)
    return {"accuracy": acc, "f1": f1, "precision": precision, "recall": recall}
"""

def training_function(args):
    # set seed
    set_seed(args.seed)
    
    # Set up logging
    logger = logging.getLogger(__name__)
    
    logging.basicConfig(
        level=logging.getLevelName("INFO"),
        handlers=[logging.StreamHandler(sys.stdout)],
        format="%(asctime)s - %(name)s - %(levelname)s - %(message)s",
    )
    
    # load datasets
    train_dataset = load_from_disk(args.train_dataset_path)
    #test_dataset = load_from_disk(args.test_dataset_path)
    
    logger.info("The loaded train_dataset length is: %s", len(train_dataset))
    #logger.info("The loaded test_dataset length is: %s", len(test_dataset))
    
    # load model from the hub
    model = AutoModelForSequenceClassification.from_pretrained(
        args.model_id,
        #use_cache=False if args.gradient_checkpointing else True,  # this is needed for gradient checkpointing (see below:)
        #Gradient Checkpointing is a method used for reducing the memory footprint when training deep neural networks,
        # at the cost of having a small increase in computation time
        #device_map="auto",
        #load_in_8bit=True,
        torch_dtype=torch.bfloat16
    )
    # create peft config
    model = create_peft_config(model)

    # Define training args
    output_dir = "/tmp"
    training_args = TrainingArguments(
        output_dir=output_dir,
        overwrite_output_dir=True,
        per_device_train_batch_size=args.per_device_train_batch_size,
        #bf16=args.bf16,  # Use BF16 if available
        learning_rate=args.lr,
        num_train_epochs=args.epochs,
        #gradient_checkpointing=args.gradient_checkpointing,
        #gradient_accumulation_steps=2,
        # logging strategies
        logging_dir=f"{output_dir}/logs",
        #logging_strategy="steps",
        #logging_steps=10,
        save_strategy="no",
        optim="adafactor",
    )
    
    """
    training_args = TrainingArguments(
    output_dir=output_dir,
    learning_rate=1e-5,
    num_train_epochs=1,
    weight_decay=0.01,
    logging_steps=1,
    max_steps=1
)
"""

    # Create Trainer instance
    trainer = Trainer(
        model=model,
        args=training_args,
        # if we want to use custom metrics
        #compute_metrics=compute_metrics,
        train_dataset=train_dataset,
        #eval_dataset=test_dataset,
        data_collator=default_data_collator,
    )

    # Start training
    trainer.train()

    # merge adapter weights with base model and save
    # save int 8 model (int 8 if prepared for int 8)
    trainer.model.save_pretrained(output_dir)
    #tokenizer.save_pretrained(output_dir)
    
    # evaluate model
    #eval_result = trainer.evaluate(eval_dataset=test_dataset)

    """
    # writes eval result to file which can be accessed later in s3 ouput
    with open(os.path.join(output_dir, "eval_results.txt"), "w") as writer:
        print("***** Eval results *****")
        for key, value in sorted(eval_result.items()):
            writer.write(f"{key} = {value}\n")
    """
    # clear memory
    del model 
    del trainer
    
    # load PEFT model in fp16
    peft_config = PeftConfig.from_pretrained(output_dir)
    model = AutoModelForSequenceClassification.from_pretrained(
        peft_config.base_model_name_or_path,
        return_dict=True,
        torch_dtype=torch.bfloat16, # use bfloat16 instead of float16
        low_cpu_mem_usage=True,
    )
    
    
    model = PeftModel.from_pretrained(model, output_dir, torch_dtype=torch.bfloat16)
    model.eval()
    # Merge LoRA and base model and save
    merged_model = model.merge_and_unload()
    merged_model.save_pretrained("/opt/ml/model/")

    # save tokenizer for easy inference
    tokenizer = AutoTokenizer.from_pretrained(args.model_id)
    tokenizer.save_pretrained("/opt/ml/model/")
    
    
    
    """
    # copy inference script
    os.makedirs("/opt/ml/model/code", exist_ok=True)
    shutil.copyfile(
        os.path.join(os.path.dirname(__file__), "inference.py"),
        "/opt/ml/model/code/inference.py",
    )
    """
    """
    # copy requirements file
    shutil.copyfile(
        os.path.join(os.path.dirname(__file__), "requirements.txt"),
        "/opt/ml/model/code/requirements.txt",
    )
    """


def main():
    args, _ = parse_arge()
    training_function(args)


if __name__ == "__main__":
    main()

I’ll update here again after i’ve tried the node.js script with the fetch call

Hi Philipp,

So I did the node.js script with fetch call (see script below):

import fetch from "node-fetch";

async function debugAPI() {
  const apiEndpoint ={endpoint_url}; // Replace with actual API endpoint URL
  const review = 'This is a test review. Great movie';

  try {
    const response = await fetch(apiEndpoint, {
      method: 'POST',
      headers: {
        'Content-Type': 'application/json',
      },
      body: JSON.stringify({ review }),
    });

    if (!response.ok) {
      throw new Error(`API returned status ${response.status} - ${response.statusText}`);
    }

    const data = await response.json();
    console.log('API Response:', data);
  } catch (error) {
    console.error('Error occurred:', error);
  }
}

debugAPI();

And got the error:

Error occurred: FetchError: invalid json response body at https://1bfbkkk5n8.execute-api.us-east-1.amazonaws.com/prod reason: Unexpected token ' in JSON at position 1
    at /Users/weyinmi/Documents/Data Science/CODE/Generative AI/PEFT LoRA Sentiment Analysis Project/debug/node_modules/node-fetch/lib/index.js:273:32
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async debugAPI (file:///Users/weyinmi/Documents/Data%20Science/CODE/Generative%20AI/PEFT%20LoRA%20Sentiment%20Analysis%20Project/debug/debug_api.js:20:18) {
  type: 'invalid-json'
}

But this suggests that it is the response from the API that is in an invalid JSON format. So to dive deeper I changed the node.js script to log the raw response being received from the API endpoint. So that the entire response can be logged, including the response headers and the response body. See the code below:

import fetch from "node-fetch";

async function debugAPI() {
  const apiEndpoint = {endpoint_url}; // Replace with the actual API endpoint URL
  const review = 'This is a test review. Terrible movie';

  try {
    const response = await fetch(apiEndpoint, {
      method: 'POST',
      headers: {
        'Content-Type': 'application/json',
      },
      body: JSON.stringify({ review }),
    });

    console.log('API Response Headers:', response.headers.raw());
    const responseBody = await response.text();
    console.log('API Response Body:', responseBody);
  } catch (error) {
    console.error('Error occurred:', error);
  }
}

debugAPI();

This resulted in an interesting error message:

API Response Headers: [Object: null prototype] {
  date: [ 'Fri, 21 Jul 2023 20:02:25 GMT' ],
  'content-type': [ 'application/json' ],
  'content-length': [ '41' ],
  connection: [ 'close' ],
  'x-amzn-requestid': [ 'c57e8395-2fc8-4125-95cd-bfc84badefb7' ],
  'access-control-allow-origin': [ '*' ],
  'x-amz-apigw-id': [ 'IbiceFAIoAMFhRg=' ],
  'x-amzn-trace-id': [
    'Root=1-64bae44f-414db83d757ed1211ef13a3b;Sampled=0;lineage=fc0f2e58:0'
  ]
}
API Response Body: {'label': 1, 'score': 0.9993422627449036}

So it seems that the response body actually has the prediction. (the test review in this case was positive), so i also tried with a negative test review and even though i’m still getting the same error, the response body does have the prediction as seen below:

API Response Headers: [Object: null prototype] {
  date: [ 'Fri, 21 Jul 2023 20:03:42 GMT' ],
  'content-type': [ 'application/json' ],
  'content-length': [ '41' ],
  connection: [ 'close' ],
  'x-amzn-requestid': [ '59c1c552-05c0-4612-bb69-07c7db2726f1' ],
  'access-control-allow-origin': [ '*' ],
  'x-amz-apigw-id': [ 'IbiomGi3IAMFZkA=' ],
  'x-amzn-trace-id': [
    'Root=1-64bae49d-390bc7b818815de7181aea74;Sampled=0;lineage=fc0f2e58:0'
  ]
}
API Response Body: {'label': 0, 'score': 0.9995365142822266}

So I tried running the app again to remind myself of the error:

this is the error:

"error": "ModelError('An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from primary with message \"{\\n  \"code\": 400,\\n  \"type\": \"InternalServerException\",\\n  \"message\": \"You need to specify either `text` or `text_target`.\"\\n}\\n\"

So now I’m just trying to tie everything together to figure it out but it’s reassuring to at least know that the predictions are being returned in the response body but it’s probably to do with the exact way they’re being returned