Bug with tokernizer's offset mapping for NER problems?

If you want to maks subtokens and special tokens, look at the script I mentioned in my earlier since it does just that with the word_ids method.