Evaluation results in training GPT-2 on WikiText-2

Hamidreza · April 14, 2021, 5:51pm

Hello everyone,

I’m trying to find the existing evaluation results for training GPT-2 on WikiText-2. In GPT-2 model card, it is mentioned that the perplexity is 29.41, whereas in this blog post by OpenAI, it is said that the perplexity is 18.34 for this task.

I was wondering whether this difference is due to different loss (huggingface used the casual language modeling loss) ?

sgugger · April 14, 2021, 7:28pm

No, the difference is in what model is evaluated. The model card takes the results reported in the paper for the smallest GPT-2 model, the PPL of 18.34 is for the largest one, which is gpt2-xl on the hub.

Hamidreza · April 14, 2021, 7:43pm

Thanks a lot for the reply.

I was also wondering how many epochs you suggest for training GPT-2 from scratch so that it reaches PPL of 29.41?

sgugger · April 14, 2021, 7:49pm

You won’t reach that PPL without training on a larger dataset, like OpenAI did.

Hamidreza · April 14, 2021, 8:30pm

I see. Thanks

Topic		Replies	Views
Huge discrepancy in perplexity of LLM for Trainer v/s scratch implementation? Beginners	1	133	October 24, 2024
Perplexity Calculation in run_clm.py 🤗Transformers	0	271	May 23, 2024
Train GPT2 on wikitext from scratch Beginners	5	3849	October 25, 2021
Increasing Perplexity when fine-tuning GPT-2 Beginners	0	681	November 20, 2020
Perplexity from fine-tuned GPT2LMHeadModel with and without lm_head as a parameter Intermediate	4	2044	May 10, 2022

Evaluation results in training GPT-2 on WikiText-2

Related topics