Continue pre-training GPT2

IdoAmit198 · March 26, 2023, 7:29am

Hi guys,

Since 2019, when OpenAI introduced to us GPT2, a lot has changed and new methods/optimization schemes emerged.
I believe GPT2 is sub-optimal considering the jump NLP made since then.

Therefore, I’m trying to continue pre-training GPT2 (small, medium, large), and would love to hear from your experience!

I’m using the openwebtext dataset, do any of you recommend a better/richer one?
Did any of you try distillation to continue pre-train GPT2?
Any other SOTA trick/optimization method you do recommend?

jonathantiedchen · March 10, 2025, 12:14pm

Hi @IdoAmit198. Do you have any updated on this? I was wondering the same but find limited resources on that. I want toncontinuous pre-train GPT2 on OpenWebMath.

Topic		Replies	Views
How to continue to pre-train gpt2? Intermediate	2	2058	July 1, 2023
Fine-tuning gpt2 generates repetive test despte many hyperparameters, gpt-large/xl? Beginners	0	559	November 3, 2020
How were the GPT2 pretrained tensorflow models created? 🤗Transformers	1	381	July 20, 2020
Help on using OpenWebText dataset 🤗Datasets	2	1199	October 18, 2022
Task-specific fine-tuning of GPT2 Research	0	1048	April 22, 2021

Continue pre-training GPT2

Related topics