How to calculate perplexity properly

It allows the model to generalize across sentence or document boundaries, which is typically what you want in generative models. This is not a requirement, by the way, but combining it with a strided window this is quite powerful.

1 Like