Tokenizing using JS

I’ve exported a custom PyTorch-based Transformer model into ONNX to run it on NodeJS. However, the exported model seems to expect input_ids directly (and not raw text).

Is there any way I can perform tokenization in JS?

Or is there something I’m missing, wherein the ONNX model itself is capable of performing the tokenization as well?

I have the same problem. Seems converting to onnx is only half the battle. Maybe I’ll write a library. How hard could it be?

I have written a JavaScript library that is capable of running the T5 tokenizer: transformers-js/tokenizers.js at main · praeclarum/transformers-js · GitHub

1 Like