How to convert sentence-transformers/msmarco-distilbert-base-tas-b model to torchscript

raj82q · October 30, 2024, 10:41am

Hii everyone I need to host a custom hugging face model on opensearch which should include the following scalar_quantization logic.

Model: sentence-transformers/msmarco-distilbert-base-tas-b

changes to add to huggingface model:

from transformers import AutoTokenizer, AutoModel
import torch
import torch.nn.functional as F
import numpy
from math import floor

def scalar_quantization(dataset, B=127):
    max_val = numpy.max(dataset)
    nmax_val = -numpy.min(dataset)
    min_val = 0
    quantized_dataset = []
    for i, val in enumerate(dataset):
        if val >= 0:
            norm_val = (val - min_val) / (max_val - min_val)
            scaled_val = norm_val * B
        else:
            norm_val = (val - 0) / (nmax_val - min_val)
            scaled_val = norm_val * B + 1

        int_part = floor(scaled_val)
        frac_part = scaled_val - int_part
        if frac_part > 0.5:
            bval = int_part + 1
        else:
            bval = int_part
        quantized_dataset.append(numpy.int8(bval))

    return quantized_dataset

# Helper: Mean Pooling - Take attention mask into account for correct averaging
def mean_pooling(model_output, attention_mask):
    token_embeddings = model_output[0]  # First element of model_output contains all token embeddings
    input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
    return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)


def model_fn(model_dir,temp=None):
    # Load model from HuggingFace Hub
    tokenizer = AutoTokenizer.from_pretrained(model_dir)
    model = AutoModel.from_pretrained(model_dir)
    return model, tokenizer


def predict_fn(data, model_and_tokenizer):
    # destruct model and tokenizer
    model, tokenizer = model_and_tokenizer
    # Tokenize sentences
    #sentences = data.pop("inputs", data)
    sentences = data[0]
    print(sentences)
    encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')

    # Compute token embeddings
    with torch.no_grad():
        model_output = model(**encoded_input)

    # Perform pooling
    sentence_embeddings = mean_pooling(model_output, encoded_input['attention_mask'])

    # Normalize embeddings
    sentence_embeddings = F.normalize(sentence_embeddings, p=2, dim=1)
    #sentence_embeddings = [scalar_quantization(embedding) for embedding in sentence_embeddings.tolist() ]
    sentence_embeddings = [scalar_quantization(sentence_embeddings[0].tolist())]
    # return dictonary, which will be json serializable
    #return {"vectors": sentence_embeddings}
    return sentence_embeddings

so I did host the model successfully on aws sagemaker by referred this article and deploy the model successfully on aws opensearch.

Now I need to deploy same model on local opensearch but the modal.tar created for sagemaker is not supported since opensearch need torchscript or onnx file format. So I am referring this article and did create a script which converts model to torchscript but when I deploy this model on opensearch it fails.

script to convert to torchscript:

from transformers import DistilBertModel, DistilBertTokenizer, DistilBertConfig
import torch
import os

# If you are instantiating the model with *from_pretrained* you can also easily set the TorchScript flag
tokenizer = DistilBertTokenizer.from_pretrained("./modal/",local_files_only=True)

# Tokenizing input text
text = "[CLS] Who was Jim Henson ? [SEP] Jim Henson was a puppeteer [SEP]"
tokenized_text = tokenizer.tokenize(text)

# Masking one of the input tokens
masked_index = 8
tokenized_text[masked_index] = '[MASK]'
indexed_tokens = tokenizer.convert_tokens_to_ids(tokenized_text)
segments_ids = [0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1]

# Creating a dummy input
tokens_tensor = torch.tensor([indexed_tokens])
segments_tensors = torch.tensor([segments_ids])
dummy_input = [tokens_tensor, segments_tensors]

# Initializing the model with the torchscript flag
# Flag set to True even though it is not necessary as this model does not have an LM Head.
config = DistilBertConfig(torchscript=True, activation="gelu",attention_dropout=0.1,dim=768,dropout=0.1,hidden_dim=3072,initializer_range=0.02,max_position_embeddings=512,model_type="distilbert",n_heads=12,n_layers=6,pad_token_id=0,qa_dropout=0.1,seq_classif_dropout=0.2,sinusoidal_pos_embds=False,tie_weights_=True,vocab_size=30522)
  

# Instantiating the model
model = DistilBertModel(config)

# The model needs to be in evaluation mode
model.eval()

# Creating the trace
traced_model = torch.jit.trace(model, [tokens_tensor, segments_tensors])
torch.jit.save(traced_model, "traced_bert.pt")

Following is the error raised when I deploy the model on opensearch:

so can someone point me where i going wrong due to which I am seeing this error ??

Topic		Replies	Views
Unable to convert Huggingface model to torchscript Beginners	1	471	June 10, 2023
Optimization strategie 🤗Transformers	0	267	October 21, 2022
Is there a Hugging Face (HF) model API for inference that is uniform with HF models and the Open AI interface? Models	1	917	July 14, 2023
Huggingface Saving `VisionEncoderDecoderModel` to `TorchScript` problem 🤗Transformers	0	649	March 22, 2023
Can I train and deploy a sentence transformer model using Huggingface estimator Models	0	641	June 6, 2022

How to convert sentence-transformers/msmarco-distilbert-base-tas-b model to torchscript

Related topics