Next sentence prediction with google/mobilebert-uncased producing massive, near-identical logits > 10^8 for its documentation example (and >2k others tried)

mmistele · October 13, 2021, 6:34pm

With a fresh install of transformers and pytorch, I ran the lines of example code from MobileBERT — transformers 4.11.3 documentation

>>> from transformers import MobileBertTokenizer, MobileBertForNextSentencePrediction
>>> import torch

>>> tokenizer = MobileBertTokenizer.from_pretrained('google/mobilebert-uncased')
>>> model = MobileBertForNextSentencePrediction.from_pretrained('google/mobilebert-uncased')

>>> prompt = "In Italy, pizza served in formal settings, such as at a restaurant, is presented unsliced."
>>> next_sentence = "The sky is blue due to the shorter wavelength of blue light."
>>> encoding = tokenizer(prompt, next_sentence, return_tensors='pt')

>>> outputs = model(**encoding, labels=torch.LongTensor([1]))
>>> loss = outputs.loss
>>> logits = outputs.logits

Printing the logits, we get tensor([[2.7888e+08, 2.7884e+08]], grad_fn=<AddmmBackward>)
For comparison, the logits produced on the same example using BertForNextSentencePrediction with bert-base-uncased instead are tensor([[-3.0729, 5.9056]], grad_fn=<AddmmBackward>).

I tried lots of different examples, and got the same strange behavior: logits of about 2e+08 for both classes, and higher for the first class in the 3rd or 4th significant figure. Given the sizes, it leads to a softmax score of 1 “is the next sentence” (the first class) and 0 for the other no matter what the first and second sentence is, no matter how unrelated the second sentence is.

Is there something not in the example code from the documentation that needs to be done in order to get non-degenerate outputs for the Next Sentence Prediction task it was pretrained on?

cc @vshampor

mmistele · October 19, 2021, 5:41pm

Linked issue: Logit explosion in MobileBertForNextSentencePrediction example from documentation (and all others tried) · Issue #13990 · huggingface/transformers · GitHub

Topic		Replies	Views
BERT next sentence prediction: bert-base always returns false Models	3	806	April 26, 2023
Bert-base-uncased performs badly in next sentence prediction (bookcorpus) 🤗Transformers	0	339	June 2, 2023
Abnormal large value of MobileBert's <cls> embed 🤗Transformers	0	123	November 1, 2023
Unexpected result from transformer model prediction Beginners	0	288	November 21, 2021
Pre-train BERT with HF Trainer 🤗Transformers	0	739	April 22, 2022

Next sentence prediction with google/mobilebert-uncased producing massive, near-identical logits > 10^8 for its documentation example (and >2k others tried)

Related topics