Hey guys,
So my goal is to send a batch of question and context to a kubernetes cluster with a pre-trained Tf-SavedModel(transformer based) model deployed on it
This is how I batch, tokenize and preprocess the requests
test_data = [("what is the capital of north korea ?", "The capital of north korea is pyongyang."),("what is the capital of south korea ?", "The capital of south korea is seoul.")]
self.tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name_or_path=tokenizer_path,
unk_token="<unk>", sep_token="</s>", cls_token="<s>")
tokenizer_output = self.tokenizer.batch_encode_plus(test_data, add_special_tokens=True,
padding='max_length',
max_length=self.max_length, truncation=True)
instances = [
{"input_ids": tokenizer_output["input_ids"], "attention_mask": tokenizer_output["attention_mask"]}]
preprocess_output = {"signature_name": "serving_default", "instances": instances}
What I get is a 500 error from the server
So I guess my question boils down to what is the correct way of sending a batch of requests and what am I doing incorrectly.
Thanks!
reference: https://www.tensorflow.org/tfx/tutorials/serving/rest_simple
https://huggingface.co/transformers/main_classes/tokenizer.html#batchencoding