In [136]: tokenizer
Out[136]: Tokenizer(vocabulary_size=64000, model=BertWordPiece, unk_token=[UNK], sep_token=[SEP], cls_token=[CLS], pad_token=[PAD], mask_token=[MASK], clean_text=True, handle_chinese_chars=False, strip_accents=False, lowercase=False, wordpieces_prefix=##)
fill_masker = pipeline(task="fill-mask", model=model, tokenizer=tokenizer)
File ~/anaconda3/envs/transformers/lib/python3.11/site-packages/transformers/pipelines/fill_mask.py:211, in FillMaskPipeline._sanitize_parameters(self, top_k, targets)
208 if top_k is not None:
209 postprocess_params["top_k"] = top_k
--> 211 if self.tokenizer.mask_token_id is None:
212 raise PipelineException(
213 "fill-mask", self.model.base_model_prefix, "The tokenizer does not define a `mask_token`."
214 )
215 return {}, {}, postprocess_params
AttributeError: 'BertWordPieceTokenizer' object has no attribute 'mask_token_id'
It clearly says that the tokenizer have a mask_token_id called “[MASK]”.
It can decode this special token correctly:
In [140]: tokenizer.encode("[MASK]").ids
Out[140]: [2, 4, 3]