GPT2 - Training data vs size comparison for GPT2-Small/Medium and XL

heirtothedemon · August 5, 2024, 10:52am

I’m trying to finetune gpt2 to create a very basic chatbot and I’ve been trying to decide on which gpt2 model to use.
After trying out pretrained small/medium/large/xl variants, GPT-XL is already very good at create a believable dialogue while gpt-2 small is not (the other are somewhere in between).

I wanted to know how much of this is because of the training data (XL was trained on several GBs more data) and how much is because of the model size. Obviously a bigger model means better inference but for a simple conversational AI, will training gpt2 on a sufficiently large data lead to better conversation generation, or is it limited by the model size?

from GPT2-XL (1.5B)
Lisa: Hi adam, how are you?
Adam: Hi Lisa, I’m good. How is your day going?
Lisa: It’s going great. I’m just about to go to work.
Adam: Oh, I’m sorry. I didn’t mean to interrupt you.
Lisa: No, it’s fine.
Adam: I’m just trying to figure out what I’m going to do with my life.

from GPT2-Small (117M)
Lisa: Hi adam, how are you?
Adam: Hi Lisa, I’m good. How is your day going?
Lisa: I’m fine. I’m just going to go to bed.
Adam: I’m going to sleep.
Lisa: I’m going to sleep.
Adam: I’m going

(I can run gpt-2 locally but not gpt-xl, so I’m okay if I have to train gpt2-small more)

Seppel123 · February 11, 2025, 12:03am

It is primarily model size. You can also check the NanoGPT Repo, where different GPT2 versions (based on size) exist, all trained on the exact same data. Also note that training duration is also an important factor, i.e., a small model training for longer might outperform a bigger model though there are limits to this.

Topic		Replies	Views
Fine-tuning GPT2 Family (Small to XL), How should hyperparameters and generation criteria change? Models	0	1149	May 30, 2023
Fine tuning GPT2 on persona chat dataset outputs gibberish Models	1	2724	April 14, 2021
Deployed GPT-2 models vs "Model Card" question Beginners	1	657	December 22, 2021
Model Recommendations Beginners	0	1171	January 4, 2023
Fine-tuning gpt2 generates repetive test despte many hyperparameters, gpt-large/xl? Beginners	0	555	November 3, 2020

GPT2 - Training data vs size comparison for GPT2-Small/Medium and XL

Related topics