How were the GPT2 pretrained tensorflow models created?

Hello there,
I wonder how the GPT2 pretained models were created. The original models were checkpointed with the tensorflow 1 API and a substantially different computation graph than the reimplemantion in huggingface transformers? I wonder what you did do get there.
Have you found a way to adapt the originally published weights?
Have the openai developers shared WebText with you?
Have you trained the models on similar data?

Thanks for your help

1 Like

I wasn’t on the team when this was done, but I guess it was convert to PyTorch using the conversion from TF scripts and then converted back to TF2 using the functions in the convert PyTorch to TF2 module.

OpenAI did not share WebText with us, and there was no retraining involved.

1 Like