Masked language modeling perplexity

Hello, in RoBERTa article, authors refer to the model’s perplexity. However, I have yet to find a clear definition of what perplexity means in the context of a model training on the Masked Language Modeling Objective as opposed to the Causal Language Modeling task.
Could someone give me a clear definition? Thanks!

It’s the exponential of the cross-entropy loss, like for CLM.

1 Like

Thank you!

sgugger gave a fine technical definition, but I believe that the intuition is that it estimates the “pool of words” the model has to choose between. Perplexity of 6 means that it’s essentially rolling a die and choosing between one of 6 options when it tries to guess what a word might be. Check out 4.2 “Weighted Branching Factor” here.