How should I handle pre/post-processing with slow tokenizers for tasks like NER and question answering?

How should folks using slow tokenizers perform pre/post processing tasks for tasks like question answering and token classification … both of which, at least from the course, appear heavily dependent on the fast-tokenizer only methods word_ids() and sequence_ids().

Also, I’m curious to know why the slow tokenizers don’t have word_ids and sequence_ids methods … and if there is a way we can get at, or build, the equivalent of them for slow tokenizers?

Thanks much!

There are no easy model-agnostic way to tackle those tasks for slow tokenizers, so you should really use a fast one for those tasks.

1 Like