Or, alternately, does anyone know why:
tokenizer = Tokenizer(BPE.from_file('./tokenizer/roberta_tokenizer/vocab.json', './tokenizer/roberta_tokenizer/merges.txt'))
print("vocab_size: ", tokenizer.model.vocab)
Fails with an error that 'tokenizers.models.BPE' object has no attribute 'vocab'
. According to the docs, it should have: Input sequences — tokenizers documentation
According to tokenizers.__version__
I’m running 0.11.0. These docs are for 0.10.0—is vocab removed in 0.11.0? Or is something just borked in my install?
UPDATE: I gave 0.10.1 a try, just for kicks, but same error.