Fine-tuning GPT-J for conversations

Eichhof · July 29, 2022, 10:00pm

Hello

I would like to fine-tune a GPT-J model for conversations that is running locally on my machine. There are two models that I can use: The original GPT-J model or the Quantized EleutherAI/gpt-j-6b with 8-bit weights. I have a machine with a 24GB GPU (RTX 3090). How much GPU memory would the original GPT-J model need for fine-tuning and for inference? As far as I understand, the main advantage of the quantized GPT-J is that it needs less GPU memory.

Second, I would like to fine-tune the GPT-J model on conversation datasets such as daily dialog, Blended Skill Talk (but without different personas), Multi-Session Chat and Wizard of the Internet.

In general, for fine-tuning GPT-J should I just format the conversation in the following way?

Person_a: Say , Jim , how about going for a few beers after dinner ?
Person_b: You know that is tempting but is really not good for our fitness.
Person_a: What do you mean ? It will help us to relax .
…

Or are there any other delimiters such as <|endoftext|> necessary? During inference, when the user is for example sending “Hello, how are you?” to the chatbot, I would then format it as “Person_a: Hello, how are you? Person_b:”.

Finally, for fine-tuning I see the following options:

Fine-tuning on only one conversation dataset.
Fine-tuning on several conversation dataset and just stacking the datasets.
Fine-tuning on the first dataset, then fine-tuning on the second dataset and so on.

Which of these three options is best?

I’m happy about any input. Thank you very much in advance.

Kickbub · July 29, 2022, 10:19pm

You could use deepspeed to reduce the system requirements needed for training.

https://github.com/mallorbc/Finetune_GPTNEO_GPTJ6B

doufulai · January 15, 2023, 8:42am

Have you found the answer? I am looking to finetune the model for these conversation dataset as well!

Topic		Replies	Views
Fine-tuning GPT-J on conversations Beginners	0	385	January 14, 2023
Finetuning GPT-J6B for custom dataset 🤗Tokenizers	1	1082	March 6, 2022
Using GPT-J models for many NLP tasks Models	0	573	November 21, 2022
How to fine-tune GPT-J Beginners	0	605	November 15, 2021
Fine tuning GPT2 on persona chat dataset outputs gibberish Models	1	2726	April 14, 2021

Fine-tuning GPT-J for conversations

Related topics