The model page for bert-base-uncased helpfully provides the following commands for passing text with masked words into a pretrained BERT model and obtaining human-readable predictions for the masked words:
from transformers import pipeline
unmasker = pipeline('fill-mask', model='bert-base-uncased')
unmasker("Hello I'm a [MASK] model.")
I would like to be able to use the pipeline API for the same purpose with BertModel
s which I’ve already saved. Unfortunately, I haven’t gotten my approach to work, even when I’ve lifted the model and tokenizer directly from Hugging Face and not customized them at all, as illustrated below.
dummy_tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
dummy_model = BertModel.from_pretrained("bert-base-uncased")
unmasker = pipeline('fill-mask', model=dummy_model, tokenizer=dummy_tokenizer)
unmasker("Paris is the [MASK] of France.")
This approach resulted in the following errors:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
/tmp/ipykernel_501/2995091846.py in <module>
1 unmasker = pipeline('fill-mask', model=dummy_model, tokenizer=dummy_tokenizer)
----> 2 unmasker("Paris is the [MASK] of France.")
/opt/conda/lib/python3.8/site-packages/transformers/pipelines/fill_mask.py in __call__(self, inputs, *args, **kwargs)
224 - **token** (`str`) -- The predicted token (to replace the masked one).
225 """
--> 226 outputs = super().__call__(inputs, **kwargs)
227 if isinstance(inputs, list) and len(inputs) == 1:
228 return outputs[0]
/opt/conda/lib/python3.8/site-packages/transformers/pipelines/base.py in __call__(self, inputs, num_workers, batch_size, *args, **kwargs)
1025 return self.iterate(inputs, preprocess_params, forward_params, postprocess_params)
1026 else:
-> 1027 return self.run_single(inputs, preprocess_params, forward_params, postprocess_params)
1028
1029 def run_multi(self, inputs, preprocess_params, forward_params, postprocess_params):
/opt/conda/lib/python3.8/site-packages/transformers/pipelines/base.py in run_single(self, inputs, preprocess_params, forward_params, postprocess_params)
1033 model_inputs = self.preprocess(inputs, **preprocess_params)
1034 model_outputs = self.forward(model_inputs, **forward_params)
-> 1035 outputs = self.postprocess(model_outputs, **postprocess_params)
1036 return outputs
1037
/opt/conda/lib/python3.8/site-packages/transformers/pipelines/fill_mask.py in postprocess(self, model_outputs, top_k, target_ids)
95 top_k = target_ids.shape[0]
96 input_ids = model_outputs["input_ids"][0]
---> 97 outputs = model_outputs["logits"]
98
99 if self.framework == "tf":
/opt/conda/lib/python3.8/site-packages/transformers/file_utils.py in __getitem__(self, k)
2668 if isinstance(k, str):
2669 inner_dict = {k: v for (k, v) in self.items()}
-> 2670 return inner_dict[k]
2671 else:
2672 return self.to_tuple()[k]
KeyError: 'logits'
How can I overcome this? I want to be able to customize a bert-base-uncased model and pass that model into the pipeline.