I’m wondering how to properly use PreTrainedTokenizerBase.build_inputs_with_special_tokens.
According to the following example
# make sure GPT2 appends EOS in begin and end def build_inputs_with_special_tokens(self, token_ids_0, token_ids_1=None): outputs = [self.bos_token_id] + token_ids_0 + [self.eos_token_id] return outputs GPT2Tokenizer.build_inputs_with_special_tokens = build_inputs_with_special_tokens
it seems that we can simply overwrite the default function which does nothing.
But when I tried doing so in my own use case:
trained_tokenizer = PreTrainedTokenizerFast(tokenizer_file='tokenizer.json',) trained_tokenizer.build_inputs_with_special_tokens = build_inputs_with_special_tokens
the results of tokenization are not with
Isn’t this function called automatically called during tokenization?