How were the GPT2 pretrained tensorflow models created?

alexsp · July 17, 2020, 3:04pm

Hello there,
I wonder how the GPT2 pretained models were created. The original models were checkpointed with the tensorflow 1 API and a substantially different computation graph than the reimplemantion in huggingface transformers? I wonder what you did do get there.
Have you found a way to adapt the originally published weights?
Have the openai developers shared WebText with you?
Have you trained the models on similar data?

Thanks for your help

sgugger · July 20, 2020, 1:41pm

I wasn’t on the team when this was done, but I guess it was convert to PyTorch using the conversion from TF scripts and then converted back to TF2 using the functions in the convert PyTorch to TF2 module.

OpenAI did not share WebText with us, and there was no retraining involved.

Topic		Replies	Views
GPT2 with TensorFlow? 🤗Transformers	1	372	November 14, 2020
How to load finetuned model in TF Beginners	2	450	September 28, 2020
I am using TFGPT2LMHeadModel and GPT2LMHeadModel, when i use tensorflow version to load pytorch_model.bin,there are some weight can not be used 🤗Transformers	0	287	August 2, 2022
[Tensorflow Export] How to export a fine tuned GPT2 model to a tensorflow model file? Beginners	1	523	January 15, 2021
.pt PyTorch Model ->PreTrainedModel Beginners	4	786	May 1, 2024

How were the GPT2 pretrained tensorflow models created?

Related topics