MLM pipeline with saved/customized BertModel

The model page for bert-base-uncased helpfully provides the following commands for passing text with masked words into a pretrained BERT model and obtaining human-readable predictions for the masked words:

from transformers import pipeline
unmasker = pipeline('fill-mask', model='bert-base-uncased')
unmasker("Hello I'm a [MASK] model.")

I would like to be able to use the pipeline API for the same purpose with BertModels which I’ve already saved. Unfortunately, I haven’t gotten my approach to work, even when I’ve lifted the model and tokenizer directly from Hugging Face and not customized them at all, as illustrated below.

dummy_tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
dummy_model = BertModel.from_pretrained("bert-base-uncased")
unmasker = pipeline('fill-mask', model=dummy_model, tokenizer=dummy_tokenizer)
unmasker("Paris is the [MASK] of France.")

This approach resulted in the following errors:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/tmp/ipykernel_501/2995091846.py in <module>
      1 unmasker = pipeline('fill-mask', model=dummy_model, tokenizer=dummy_tokenizer)
----> 2 unmasker("Paris is the [MASK] of France.")

/opt/conda/lib/python3.8/site-packages/transformers/pipelines/fill_mask.py in __call__(self, inputs, *args, **kwargs)
    224             - **token** (`str`) -- The predicted token (to replace the masked one).
    225         """
--> 226         outputs = super().__call__(inputs, **kwargs)
    227         if isinstance(inputs, list) and len(inputs) == 1:
    228             return outputs[0]

/opt/conda/lib/python3.8/site-packages/transformers/pipelines/base.py in __call__(self, inputs, num_workers, batch_size, *args, **kwargs)
   1025             return self.iterate(inputs, preprocess_params, forward_params, postprocess_params)
   1026         else:
-> 1027             return self.run_single(inputs, preprocess_params, forward_params, postprocess_params)
   1028 
   1029     def run_multi(self, inputs, preprocess_params, forward_params, postprocess_params):

/opt/conda/lib/python3.8/site-packages/transformers/pipelines/base.py in run_single(self, inputs, preprocess_params, forward_params, postprocess_params)
   1033         model_inputs = self.preprocess(inputs, **preprocess_params)
   1034         model_outputs = self.forward(model_inputs, **forward_params)
-> 1035         outputs = self.postprocess(model_outputs, **postprocess_params)
   1036         return outputs
   1037 

/opt/conda/lib/python3.8/site-packages/transformers/pipelines/fill_mask.py in postprocess(self, model_outputs, top_k, target_ids)
     95             top_k = target_ids.shape[0]
     96         input_ids = model_outputs["input_ids"][0]
---> 97         outputs = model_outputs["logits"]
     98 
     99         if self.framework == "tf":

/opt/conda/lib/python3.8/site-packages/transformers/file_utils.py in __getitem__(self, k)
   2668         if isinstance(k, str):
   2669             inner_dict = {k: v for (k, v) in self.items()}
-> 2670             return inner_dict[k]
   2671         else:
   2672             return self.to_tuple()[k]

KeyError: 'logits'

How can I overcome this? I want to be able to customize a bert-base-uncased model and pass that model into the pipeline.

Hi @svenchilton

Two points here:

  1. In general it is advised to use the Auto classes (e.g. AutoTokenizer, or AutoModel) as they are more flexible and will automatically look for the appropriate model class under the hood.
  2. As for the issue you are facing: you are loading a model without a task head. I think using BertForMaskedLM instead of BertModel should resolve your issue or even better following the comment above: AutoModelForMaskedLM

@lvwerra, thanks for writing back. :smiley: Perhaps a little more context is in order, though. I’m putting together a tutorial illustrating how to optimize a Hugging Face transformer for inference with NVIDIA’s Torch-TensorRT framework. Ideally, the demo will show that inference with a Torch-TensorRT-modified transformer is faster than that from a TorchScript transformer, which in turn is faster than a transformer lifted directly from Hugging Face (though I realize you can lift a TorchScript version directly from Hugging Face as well). In addition to passing the random data into the various versions of the model for benchmarking purposes, I want to verify in human-readable terms that the different versions of the model behave as they should, i.e. I want to pass masked sentences into the different versions of the model and obtain sensible, human-readable results. Can I do that with either the pipeline or BertForMaskedLM API?

Side note: I do have a BertConfig object for the model which my coworkers optimize in their original benchmarking script, which I now need to adapt. It appears that they intentionally did not include an LM head. For my human readability sanity check, could I do something like the procedure below:

opt_mlm_model = BertForMaskedLM(my_bert_config).from_pretrained_model(my_optimized_bert_model)

and then pass a masked sentence into opt_mlm_model?

Aha! I think my new idea should work! When I run the following code

mlm_tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
mlm_model = BertForMaskedLM.from_pretrained("bert-base-uncased")
unmasker = pipeline('fill-mask', model=mlm_model, tokenizer=mlm_tokenizer)
unmasker("Paris is the [MASK] of France.")

I get these results:

[{'score': 0.9969369173049927,
  'token': 3007,
  'token_str': 'c a p i t a l',
  'sequence': 'paris is the capital of france.'},
 {'score': 0.0005914837238378823,
  'token': 2540,
  'token_str': 'h e a r t',
  'sequence': 'paris is the heart of france.'},
 {'score': 0.0004378739686217159,
  'token': 2415,
  'token_str': 'c e n t e r',
  'sequence': 'paris is the center of france.'},
 {'score': 0.00033783461549319327,
  'token': 2803,
  'token_str': 'c e n t r e',
  'sequence': 'paris is the centre of france.'},
 {'score': 0.0002699580800253898,
  'token': 2103,
  'token_str': 'c i t y',
  'sequence': 'paris is the city of france.'}]

It should be a simple matter to extract the most likely sequence. :smile:

Unfortunately, I am not familiar with the TensorRT framework. I believe that the pipeline and BertForMaskedLM classes exclusively work with transformers models. That said you might be interested in the optimum library which allows you to optimize transformer models and the ONNX integration in transformers.

As for using a custom config the proper way to use it is the following:

opt_mlm_model = BertForMaskedLM.from_pretrained_model(my_optimized_bert_model, config=my_bert_config)

cc @lewtun

Hmm, darn, OK. Maybe I can go about this another way, though. When I run the following code

mlm_tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
config = BertConfig(vocab_size_or_config_json_file=32000, hidden_size=768,
    num_hidden_layers=12, num_attention_heads=12, intermediate_size=3072, torchscript=True)
mlm_model = BertForMaskedLM(config)
text = "Paris is the [MASK] of France."
encoded_input = mlm_tokenizer(text, return_tensors='pt')
output = mlm_model(**encoded_input)

I get as my output a tuple consisting of a tensor of shape torch.Size([1, 9, 30522]). Is there any way I can extract the most likely masked word from that tensor?

You could use the pipeline as before in the case, no? As explanation for output shape: you have a batch with 1 element, 9 tokens in the sequence, and 30522 predictions for each possible token. If you want to know the most likely token_id given that you know the position of the mask in the sequence pos_mask you can do:

most_likely_token_id = torch.argmax(output.logits[0, pos_mask, :])

Also in the current setup you are not loading a pretrained checkpoint but initialize the model from scratch. You can also use the from_pretrained method for the config and then only need to pass the keywords that are different from the standard. E.g.:

config = AutoConfig.from_pretrained("bert-base-uncased", hidden_size=1234)

Which would load all the standard configs except hidden size.

Thanks again. I’m going to try a few things and see how it goes. I’ll write back with my results and/or additional questions.

Following up on

most_likely_token_id = torch.argmax(output.logits[0, pos_mask, :])

if I know pos_mask (which, naively, I imagine would be 3 for "Paris is the [MASK] of France."), how can I then find the token associated with most_likely_token_id?

Hey @svenchilton I think this example from the docs shows you how to get the top token probabilities:

from transformers import AutoModelForMaskedLM, AutoTokenizer
import torch

tokenizer = AutoTokenizer.from_pretrained("distilbert-base-cased")
model = AutoModelForMaskedLM.from_pretrained("distilbert-base-cased")

sequence = (
    "Distilled models are smaller than the models they mimic. Using them instead of the large "
    f"versions would help {tokenizer.mask_token} our carbon footprint."
)

inputs = tokenizer(sequence, return_tensors="pt")
mask_token_index = torch.where(inputs["input_ids"] == tokenizer.mask_token_id)[1]

token_logits = model(**inputs).logits
mask_token_logits = token_logits[0, mask_token_index, :]

top_5_tokens = torch.topk(mask_token_logits, 5, dim=1).indices[0].tolist()

for token in top_5_tokens:
    print(sequence.replace(tokenizer.mask_token, tokenizer.decode([token])))

@lewtun, thanks!