I would like to fine-tune a GPT-J model for conversations that is running locally on my machine. There are two models that I can use: The original GPT-J model or the Quantized EleutherAI/gpt-j-6b with 8-bit weights. I have a machine with a 24GB GPU (RTX 3090). How much GPU memory would the original GPT-J model need for fine-tuning and for inference? As far as I understand, the main advantage of the quantized GPT-J is that it needs less GPU memory.
In general, for fine-tuning GPT-J should I just format the conversation in the following way?
Person_a: Say , Jim , how about going for a few beers after dinner ?
Person_b: You know that is tempting but is really not good for our fitness.
Person_a: What do you mean ? It will help us to relax .
Or are there any other delimiters such as
<|endoftext|> necessary? During inference, when the user is for example sending “Hello, how are you?” to the chatbot, I would then format it as “Person_a: Hello, how are you? Person_b:”.
Finally, for fine-tuning I see the following options:
- Fine-tuning on only one conversation dataset.
- Fine-tuning on several conversation dataset and just stacking the datasets.
- Fine-tuning on the first dataset, then fine-tuning on the second dataset and so on.
Which of these three options is best?
I’m happy about any input. Thank you very much in advance.