Instances in tensorflow serving DialoGPT-large model

Hi. I have managed to successfully use pretrained DialoGPT model in tensorflow serving (thanks to Merve). Rest API is up and running as it should. The issue occurs when I try to send the data to it. When I try to pass in the example input (available at Huggingface’s API documentation under conversational model section). I get an error “error”:“Missing ‘inputs’ or ‘instances’ key”. This is the part where I get confused.
In the tutorial I watched on youtube it’s stated that “instances” are bassically inputs (video section is available on: tf serving tutorial | tensorflow serving tutorial | Deep Learning Tutorial 48 (Tensorflow, Python) - YouTube), but something else is written in the tensorflow serving documentation (available on: RESTful API  |  TFX  |  TensorFlow).So my question is, how can I get the values that need to be passed into the “instances” key, what are they? Are they in any file in the model’s huggingface repository? Thank you for answering in advance. (And sorry because you have to copy paste URLs, but platform wouldn’t let me use more than 2). This is the code I’ve written in node.js to test the Rest API:

const axios = require('axios');

const ai_url = "http://localhost:8601/v1/models/dialogpt:predict";

const payload = {
    "instances": [],
    "inputs": {
        "past_user_inputs": ["Which movie is the best ?"],
        "generated_responses": ["It's Die Hard for sure."],
        "text": "Can you explain why ?"
console.log(JSON.stringify(payload));, { data: payload })
    .then(function (response) {
        console.log("data: " + JSON.stringify(;
    .catch(function (error) {

```This text will be hidden
1 Like

Hello :wave:
Here you can find a neat tutorial on how to use Hugging Face models with TF Serving. As you guessed, instances are your examples you want your model to infer.

batch = tokenizer(sentence)
batch = dict(batch)
batch = [batch]
input_data = {"instances": batch}

Your payload input works just fine in Inference API btw. My guess is you could put your inputs to instances part and it would work just fine. (maybe try as a list if it doesn’t) Something like:

batch = [{"inputs": {
        "past_user_inputs": ["Which movie is the best ?"],
        "generated_responses": ["It's Die Hard for sure."],
        "text": "Can you explain why ?"
input_data = {"instances": batch}

Let me know if it doesn’t work.

1 Like

Hi :wave:
Thanks for helping me out. I figured, I would use the huggingface’s node.js tokenizers library to tokenize the input. Unfortunately my node.js code gets the “TypeError: failed downcast to function” error befor I get to try your soultion. I get the error even when I’m using the example code from the node.js tokenizers library:

import { BertWordPieceTokenizer } from "tokenizers";

const wordPieceTokenizer = await BertWordPieceTokenizer.fromOptions({ vocabFile: "./vocab.txt" });
const wpEncoded = await wordPieceTokenizer.encode("Who is John?", "John is a teacher");

Also, is the vocabFile in the example the vocab.json file or the merges.txt file from the model repository? And in what order should I tokenize inputs? In order like in conversation or “past_user inputs” first, then “generated_responses” and lastly “text” input? This is the part of the code where I get the error at tokenization:

let { BPETokenizer } = require("tokenizers");
let merges = 'path_to_file';
const ai_url = "http://localhost:8601/v1/models/dialogpt:predict";
async function api_request() {
    const sentence = "Can you explain why ?";
    const wordPieceTokenizer = await BPETokenizer.fromOptions({ vocabFile: merges });
    const wpEncoded = await wordPieceTokenizer.encode(sentence);   //the error occurs in this line
    batch = wpEncoded;
    console.log("Encoded sentence: " + wpEncoded);

Thank you for helping me in advance.

Update: for the tokenization issue I figured the best solution would be to just use python for that, so I’m going to make a Rest API for tokenization and call it in my node.js code. Going to paste code for it here when it’s done. I just need the help with the question in what order to tokenize messages I mentioned in the previous post. Thank you for helping me in advance

Hello, from what I get, looking at the documentation of DialoGPT implementation in Hugging Face:

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

tokenizer = AutoTokenizer.from_pretrained("microsoft/DialoGPT-medium")
model = AutoModelForCausalLM.from_pretrained("microsoft/DialoGPT-medium")

# Let's chat for 5 lines
for step in range(5):
    # encode the new user input, add the eos_token and return a tensor in Pytorch
    new_user_input_ids = tokenizer.encode(input(">> User:") + tokenizer.eos_token, return_tensors='pt')

    # append the new user input tokens to the chat history
    bot_input_ids =[chat_history_ids, new_user_input_ids], dim=-1) if step > 0 else new_user_input_ids

    # generated a response while limiting the total chat history to 1000 tokens, 
    chat_history_ids = model.generate(bot_input_ids, max_length=1000, pad_token_id=tokenizer.eos_token_id)

    # pretty print last ouput tokens from bot
    print("DialoGPT: {}".format(tokenizer.decode(chat_history_ids[:, bot_input_ids.shape[-1]:][0], skip_special_tokens=True)))

You can just tokenizer user input, append tokenized input at the end of chat_history_ids (if this is the first turn of conversation, it is none I guess) and keep adding it in each turn.