Return all scores parameter doesn't work with model deployed on Inf1

YannAgora · March 19, 2024, 10:15am

Hello there,

I’ve deployed a text classification model (based on xlm-roberta) on ml.inf1.2xlarge instance following @philschmid’s blog article.

My problem is that when I call the model and put the return_all_scores=True parameter in the payload, I only have the class with the highest confidence score. I don’t know whether the problem comes from the custom predict_fn() or DLC I’m using or something else, if someone could help me resolve the issue it’d be much appreciated

Custom inference script

import os
from transformers import AutoConfig, AutoTokenizer
import torch
import torch.neuron

# To use one neuron core per worker
os.environ["NEURON_RT_NUM_CORES"] = "1"

# saved weights name
AWS_NEURON_TRACED_WEIGHTS_NAME = "neuron_model.pt"

def model_fn(model_dir):
    # load tokenizer and neuron model from model_dir
    tokenizer = AutoTokenizer.from_pretrained(model_dir)
    model = torch.jit.load(os.path.join(model_dir, AWS_NEURON_TRACED_WEIGHTS_NAME))
    model_config = AutoConfig.from_pretrained(model_dir)

    return model, tokenizer, model_config

def predict_fn(data, model_tokenizer_model_config):
    # destruct model, tokenizer and model config
    model, tokenizer, model_config = model_tokenizer_model_config

    # create embeddings for inputs
    inputs = data.pop("inputs", data)
    embeddings = tokenizer(
        inputs,
        return_tensors="pt",
        max_length=model_config.traced_sequence_length,
        padding="max_length",
        truncation=True,
    )
    # convert to tuple for neuron model
    neuron_inputs = tuple(embeddings.values())

    # run prediciton
    with torch.no_grad():
        predictions = model(*neuron_inputs)[0]
        scores = torch.nn.Softmax(dim=1)(predictions)

    # return dictonary, which will be json serializable
    return [{"label": model_config.id2label[item.argmax().item()], "score": item.max().item()} for item in scores]

DLC container: 763104351884.dkr.ecr.eu-west-1.amazonaws.com/huggingface-pytorch-inference-neuron:1.9.1-transformers4.12.3-neuron-py37-sdk1.17.1-ubuntu18.04

Topic		Replies	Views
Reduced inference f1 score with QLoRA finetuned model Intermediate	1	881	September 6, 2023
Emotion Model: Additional inference parameter not processed in Sagemaker inferentia instance Amazon SageMaker	1	278	July 17, 2023
Return all class labels from SageMaker invoke_endpoint Amazon SageMaker	8	1543	January 22, 2022
Emotion Model: Additional inference parameter not processed in Sagemaker Amazon SageMaker	4	1214	June 28, 2022
-inf values for logit score outputs with model.generate 🤗Transformers	3	818	January 2, 2025

Return all scores parameter doesn't work with model deployed on Inf1

Related topics