Batch inference using tfserving/kfserving

asharma · December 22, 2020, 9:14am

Hey guys,
So my goal is to send a batch of question and context to a kubernetes cluster with a pre-trained Tf-SavedModel(transformer based) model deployed on it

This is how I batch, tokenize and preprocess the requests

test_data =  [("what is the capital of north korea ?", "The capital of north korea is pyongyang."),("what is the capital of south korea ?", "The capital of south korea is seoul.")]

self.tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name_or_path=tokenizer_path,
                                                       unk_token="<unk>", sep_token="</s>", cls_token="<s>")

tokenizer_output = self.tokenizer.batch_encode_plus(test_data, add_special_tokens=True,
                                                            padding='max_length',
                                                            max_length=self.max_length, truncation=True)

instances = [
   {"input_ids": tokenizer_output["input_ids"], "attention_mask": tokenizer_output["attention_mask"]}]

preprocess_output = {"signature_name": "serving_default", "instances": instances}

What I get is a 500 error from the server

So I guess my question boils down to what is the correct way of sending a batch of requests and what am I doing incorrectly.

Thanks!

reference: https://www.tensorflow.org/tfx/tutorials/serving/rest_simple
https://huggingface.co/transformers/main_classes/tokenizer.html#batchencoding

Topic		Replies	Views
Flan-T5 with Tensorflow-Serving 🤗Transformers	0	416	October 9, 2023
How to use transformers for batch inference 🤗Transformers	1	28548	August 20, 2021
How to use transformers&tensorflow for batch inference Beginners	0	528	August 20, 2021
Is that possible to embed the tokenizer into the model to have it running on GCP using TensorFlow Serving? 🤗Tokenizers	4	3240	January 12, 2023
Instances in tensorflow serving DialoGPT-large model Beginners	4	1211	January 17, 2022

Batch inference using tfserving/kfserving

Related topics