Load tokenizer from file : Exception: data did not match any variant of untagged enum ModelWrapper

I had to retrain the tokenizer to make it work. During the retraining, I had to initialize the tokenizer with a pre_tokenizer;

tokenizer = Tokenizer(BPE())
tokenizer.pre_tokenizer = Whitespace() # this is the line I was missing which caused the error
trainer = trainers.BpeTrainer(
    vocab_size=10_000,
    min_frequency=5
)
tokenizer.train(['./input.txt'], trainer)
tokenizer.save('./bpe_tokenizer.json')
tokenizer1 = Tokenizer.from_file('./bpe_tokenizer.json')
tokenizer2 = PreTrainedTokenizerFast(tokenizer_file='./bpe_tokenizer.json')
2 Likes