Is there any difference between GPT-J and GPT-2?

All i know is GPT-J is GPT-3 like model, and GPT-3 is bigger implementation of GPT-2 architecture. Then is there any technical difference between GPT-2 and GPT-J?

1 Like

A answer comes from new bing:

GPT-J and GPT-2 are both transformer models that can generate text from a prompt. However, they have some differences in their size, architecture, training data and performance.

GPT-J is a newer and larger model than GPT-2. It has 6 billion parameters, while GPT-2 has 1.5 billion parameters¹. GPT-J also uses Rotary Position Embedding (RoPE) to encode the position of each token, while GPT-2 uses sinusoidal position embedding².

GPT-J was trained on the Pile, a large-scale curated dataset created by EleutherAI, which contains diverse sources of text such as books, Wikipedia, GitHub and Reddit². GPT-2 was trained on WebText, a smaller dataset scraped from the internet by OpenAI, which mainly consists of news articles and blog posts³.

GPT-J is claimed to approach GPT-3-level performance on some natural language processing tasks, such as arithmetic, logic, quantitative reasoning and pseudocode generation. GPT-2 is less capable of these tasks and somewhat underperforms equal-sized GPT-3 models¹.

Source:
(2) EleutherAI/gpt-j-6B · Hugging Face. EleutherAI/gpt-j-6B · Hugging Face Accessed 2023/3/31.
(3) EleutherAI claims new NLP model approaches GPT-3-level performance. EleutherAI claims new NLP model approaches GPT-3-level performance | VentureBeat Accessed 2023/3/31.

2 Likes

while the above answer is helpful, I believe GPT-J uses sinusoidal position embedding as well according to hugging face https://github.com/huggingface/transformers/blob/172f42c512e1bf32554ef910fe82f07916b4d4af/src/transformers/models/gptj/modeling_gptj.py#L115