Why BigBirdTokenizer can’t load my own vocab or trained BPE results?

BigBirdTokenizer can’t load vacob results. But BERT and RoBERTa can.

tokenizer = RobertaTokenizer.from_pretrained('my_bpe', max_len=512)  # right
tokenizer = BertTokenizer.from_pretrained('./data/my_vocab.txt')  # right

tokenizer = BigBirdTokenizer.from_pretrained('my_bpe') # not right



    175 
    176     def LoadFromFile(self, arg):
--> 177         return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
    178 
    179     def Init(self,

RuntimeError: Internal: /sentencepiece/src/sentencepiece_processor.cc(818) [model_proto->ParseFromArray(serialized.data(), serialized.size())] 

How can I train a token to use in BigBirdTokenizer? Thanks

Hi , may I know in which format is the token given ?

  1. vacob.txt merge.txt
  2. vacob.json merge.txt