I’m trying to overwrite
predict_fn for a named-entity recognition task. Mostly because I provide very long sequences. In this case, I need to use the tokenizer to break the long sequence down, and then merge the predictions of all the sub-sequences.
I call the tokenizer as:
sentences = tokenizer(sentence, max_length=max_length, stride=stride, truncation=True, return_overflowing_tokens=True)
Since I have a stride, I need to properly care for overlapping tokens and their predictions.
I can access the model, as it is received as a parameter of the
predict_fn, but how do I access the tokenizer?