BigBirdTokenizer can’t load vacob results. But BERT and RoBERTa can.
tokenizer = RobertaTokenizer.from_pretrained('my_bpe', max_len=512) # right
tokenizer = BertTokenizer.from_pretrained('./data/my_vocab.txt') # right
tokenizer = BigBirdTokenizer.from_pretrained('my_bpe') # not right
175
176 def LoadFromFile(self, arg):
--> 177 return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
178
179 def Init(self,
RuntimeError: Internal: /sentencepiece/src/sentencepiece_processor.cc(818) [model_proto->ParseFromArray(serialized.data(), serialized.size())]
How can I train a token to use in BigBirdTokenizer? Thanks