Hey all,
I am relatively new to HuggingFace and deep NLP in general. I have noticed in the documentation and in some example notebooks I have seen that tokenizers are used as follows:
tokenizer = SomeTokenizerClass()
encoding = tokenizer(text_to_tokenize, context_of_text)
Where text_to_tokenize
and context_of_text
are both str
objects. In the documentation, this type of call is shown here
What does this type of call to a tokenizer do and why would it be different than encoding = tokenizer(text_to_tokenize + ' ' + context_of_text)
Thank you so much for your help!