Continue pre-training GPT2

Hi guys,

Since 2019, when OpenAI introduced to us GPT2, a lot has changed and new methods/optimization schemes emerged.
I believe GPT2 is sub-optimal considering the jump NLP made since then.

Therefore, I’m trying to continue pre-training GPT2 (small, medium, large), and would love to hear from your experience!

  • I’m using the openwebtext dataset, do any of you recommend a better/richer one?
  • Did any of you try distillation to continue pre-train GPT2?
  • Any other SOTA trick/optimization method you do recommend?
2 Likes

Hi @IdoAmit198. Do you have any updated on this? I was wondering the same but find limited resources on that. I want toncontinuous pre-train GPT2 on OpenWebMath.

1 Like