Hi
Thanks for helping me out. I figured, I would use the huggingface’s node.js tokenizers library to tokenize the input. Unfortunately my node.js code gets the “TypeError: failed downcast to function” error befor I get to try your soultion. I get the error even when I’m using the example code from the node.js tokenizers library:
import { BertWordPieceTokenizer } from "tokenizers";
const wordPieceTokenizer = await BertWordPieceTokenizer.fromOptions({ vocabFile: "./vocab.txt" });
const wpEncoded = await wordPieceTokenizer.encode("Who is John?", "John is a teacher");
Also, is the vocabFile in the example the vocab.json file or the merges.txt file from the model repository? And in what order should I tokenize inputs? In order like in conversation or “past_user inputs” first, then “generated_responses” and lastly “text” input? This is the part of the code where I get the error at tokenization:
let { BPETokenizer } = require("tokenizers");
let merges = 'path_to_file';
const ai_url = "http://localhost:8601/v1/models/dialogpt:predict";
async function api_request() {
const sentence = "Can you explain why ?";
const wordPieceTokenizer = await BPETokenizer.fromOptions({ vocabFile: merges });
const wpEncoded = await wordPieceTokenizer.encode(sentence); //the error occurs in this line
console.log(wpEncoded);
batch = wpEncoded;
console.log("Encoded sentence: " + wpEncoded);
}
Thank you for helping me in advance.