Hi! I’m following this tutorial to fine tune GPT-2 on wikitext. But the tutorial doesn’t specify what size of GPT-2 they are fine tuning on. I was curious how can I find the size of the model and if there’s any way I can fine-tune it on a different size.
--model_name_or_path=gpt2 arg passed to the script indicates that it’s the default
gpt2 model from Huggingface. That would be this one, which says “This is the smallest version of GPT-2, with 124M parameters.”
To change the size of the GPT2 model you’re using, you can pass any of these GPT2 models to that argument:
gpt2 gpt2-large gpt2-medium gpt2-xl
In general, the models available from Huggingface always have a short name/ID such as “gpt2”, “t5-3b”, etc and you can use that name to look up documentation about that particular model version (how big it is, how it was trained, etc) on huggingface.co.
That being said, I noticed that you linked to Huggingface documentation for version 2.0.0, which is several years old and contains a lot of stuff that’s outdated nowadays. I’d recommend using the documentation and GitHub version from the latest release (4.27.3). That’ll give you access to more models, better features, and it’ll be easier for people in the community to assist you.