Error training MLM with Roberta Tokenizer

I am currently trying to train a MLM using a ByteLevelBPETokenizer on a custom corpus and am getting the following error:

AttributeError: ‘tokenizers.Tokenizer’ object has no attribute ‘mask_token’

Shown below is the code:

BOS = “
EOS = “

UNK = “”
PAD = “”
MASK = “”

tokenizer = Tokenizer(BPE())

tokenizer.pre_tokenizer = pre_tokenizers.ByteLevel(add_prefix_space=False)
tokenizer.decoder = decoders.ByteLevel()
tokenizer.enable_truncation(max_length=512)
tokenizer.enable_padding()

trainer = BpeTrainer(
vocab_size=50000,
special_tokens=[BOS, PAD, EOS, UNK, MASK],
initial_alphabet=pre_tokenizers.ByteLevel.alphabet()
)

tokenizer.train_from_iterator(batch_iterator(), trainer=trainer)

tokenizer.post_processor = RobertaProcessing(
sep=(EOS, tokenizer.token_to_id(EOS)),
cls=(BOS, tokenizer.token_to_id(BOS))
)

data_collator = DataCollatorForLanguageModeling(
tokenizer,
mlm_probability=0.15,
return_tensors=‘tf’)

Any ideas? The current environment makes it difficult for me to save the tokenizer and load it back using a load from pretrained.

Thanks

I have the same problem here. Did you find any solution to it?