Batch inference using tfserving/kfserving

Hey guys,
So my goal is to send a batch of question and context to a kubernetes cluster with a pre-trained Tf-SavedModel(transformer based) model deployed on it

This is how I batch, tokenize and preprocess the requests

test_data =  [("what is the capital of north korea ?", "The capital of north korea is pyongyang."),("what is the capital of south korea ?", "The capital of south korea is seoul.")]

self.tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name_or_path=tokenizer_path,
                                                       unk_token="<unk>", sep_token="</s>", cls_token="<s>")

tokenizer_output = self.tokenizer.batch_encode_plus(test_data, add_special_tokens=True,
                                                            padding='max_length',
                                                            max_length=self.max_length, truncation=True)

instances = [
   {"input_ids": tokenizer_output["input_ids"], "attention_mask": tokenizer_output["attention_mask"]}]

preprocess_output = {"signature_name": "serving_default", "instances": instances}

What I get is a 500 error from the server

So I guess my question boils down to what is the correct way of sending a batch of requests and what am I doing incorrectly.

Thanks!

reference: https://www.tensorflow.org/tfx/tutorials/serving/rest_simple
https://huggingface.co/transformers/main_classes/tokenizer.html#batchencoding