Hello, is there an existing “Fast” (Rust based) whitespace tokenizer?
If not, am I able to train a “Fast” whitespace tokenizer ?
Reason for wanting to use a “Fast” tokenizer is because I would like to use the offset mapping to recover multi-word entities in NER