Masked language modeling perplexity

samkphd31 · January 18, 2021, 10:37am

Hello, in RoBERTa article, authors refer to the model’s perplexity. However, I have yet to find a clear definition of what perplexity means in the context of a model training on the Masked Language Modeling Objective as opposed to the Causal Language Modeling task.
Could someone give me a clear definition? Thanks!

sgugger · January 19, 2021, 9:12pm

It’s the exponential of the cross-entropy loss, like for CLM.

samkphd31 · January 20, 2021, 10:25am

Thank you!

sdegrace · January 21, 2021, 5:19pm

sgugger gave a fine technical definition, but I believe that the intuition is that it estimates the “pool of words” the model has to choose between. Perplexity of 6 means that it’s essentially rolling a die and choosing between one of 6 options when it tries to guess what a word might be. Check out 4.2 “Weighted Branching Factor” here.

Topic		Replies	Views
Metrics for masked language modeling (mlm) Beginners	0	509	September 16, 2021
Calculating perplexity from hidden_states Intermediate	2	1377	March 21, 2023
How to correctly evaluate a Masked Language Model? 🤗Transformers	3	4432	August 11, 2023
Accuracy of MLM model 🤗Transformers	5	1532	July 13, 2021
Guide: The best way to calculate the perplexity of fixed-length models Research	9	9507	December 16, 2021

Masked language modeling perplexity

Related topics