Issue with post-processing

I trained a new BertWordPieceTokenizer from scratch, using the same code from the example given in the docs. Then, I created a new TemplateProcessing object & assigned it as the tokenizer’s PostProcessor in order to add [CLS] and [SEP] tokens (also using the example code). However, when I encode sentences with the tokenizer, it doesn’t preform any post-processing.

Code:

from tokenizers import BertWordPieceTokenizer
from tokenizers.processors import TemplateProcessing

corpus = “./corpus.txt”

tokenizer = BertWordPieceTokenizer(
clean_text=True,
handle_chinese_chars=False,
strip_accents=True,
lowercase=True,
)

tokenizer.train(
corpus,
vocab_size=32000,
min_frequency=2,
show_progress=True,
special_tokens=[“[UNK]”, “[CLS]”, “[SEP]”],
limit_alphabet=1000,
wordpieces_prefix=“##”,
)

tokenizer.post_processor = TemplateProcessing(
single=“[CLS] $A [SEP]”,
pair=“[CLS] $A [SEP] $B:1 [SEP]:1”,
special_tokens=[
(“[CLS]”, tokenizer.token_to_id(“[CLS]”)),
(“[SEP]”, tokenizer.token_to_id(“[SEP]”)),
],
)

output = tokenizer.encode(“Hello, y’all! How are you :grin: ?”)
print(output.tokens)

Output:

[‘hello’, ‘,’, ‘y’, “'”, ‘all’, ‘!’, ‘how’, ‘are’, ‘you’, ‘[UNK]’, ‘?’]

Hi there! Did you end up finding a solution to the problem?