Why does tokenization take so long?

lol. That depends on the tokenizer you’re using.

Check out ‘rs-bpe’ on PyPI / GitHub. It currently outperforms both tiktoken and tokenizers.

1 Like