Strange output using BioBERT for imputing MASK tokens

I’m trying to use BioBERT (downloaded from the HuggingFace models repository at dmis-lab/biobert-v1.1) to fill in MASK tokens in text, and I’m getting some unexpected behavior with the suggested tokens.

I pasted a screenshot below comparing bert-base-uncased (which behaves as expected and has sensible most-likely tokens) with BioBERT:

Here’s the code to reproduce this:

from transformers import AutoTokenizer, AutoModelForMaskedLM
import torch

text = 'heart disease is [MASK] leading cause of death in the united states.'

tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')
model = AutoModelForMaskedLM.from_pretrained('bert-base-uncased')
tokenized = tokenizer(text, return_tensors='pt')
idx = tokenizer.convert_ids_to_tokens(tokenized.input_ids[0]).index(tokenizer.mask_token)
output = model(**tokenized, return_dict=True)
print(tokenizer.convert_ids_to_tokens(torch.topk(output.logits[0, idx, :], 10).indices))

tokenizer = AutoTokenizer.from_pretrained('dmis-lab/biobert-v1.1')
model = AutoModelForMaskedLM.from_pretrained('dmis-lab/biobert-v1.1')
tokenized = tokenizer(text, return_tensors='pt')
idx = tokenizer.convert_ids_to_tokens(tokenized.input_ids[0]).index(tokenizer.mask_token)
output = model(**tokenized, return_dict=True)
print(tokenizer.convert_ids_to_tokens(torch.topk(output.logits[0, idx, :], 10).indices))

And here’s my output from running transformers-cli env:

- `transformers` version: 4.1.1
- Platform: macOS-10.11.6-x86_64-i386-64bit
- Python version: 3.8.5
- PyTorch version (GPU?): 1.4.0 (False)
- Tensorflow version (GPU?): not installed (NA)
- Using GPU in script?: No
- Using distributed or parallel set-up in script?: No

I also asked about similar issues with PubMedBERT as a Github issue a while back, but haven’t gotten a response.

Do the pretrained weights for these models not contain the components necessary for doing masked language modeling/imputing MASK tokens? Is there any way to fix this issue?

Hi,

I am not an expert, but that is what it looks like to me.

Masked Language Modelling is usually used during pre-training, and is often not needed during fine-tuning, so I guess the DIMS team didn’t think the MLM parameters would be required.

I notice that the DIMS team have provided 5 models. Do any of the other models have MLM parameters?

It should certainly be possible to copy the DIMS weights into your own model, where your own model does include an MLM head. I expect you would then need to train your model before it would give sensible answers, unless you could find a suitable MLM head to copy (probably not…).

The reason you are not getting a response is because this is near impossible to debug: these are third-party models that someone else trained. It is possible that they did not train/finetune these models on MLM and in such event the model doesn’t know what to output for the mask.

You should try to get into contact with the model creators to get an answer.