Which form needs the dataset to be for finetuning GPT-Neo?

mzapf · December 29, 2022, 7:52pm

I’m pretty new to NLP.
I want to finetune the GPT-Neo model (1,3B) link to the model with an custom dataset in which it gets answers or instructions as input and labels are the desired generated text. I use the tokenizer = GPT2Tokenizer.from_pretrained(model_name), but I don’t really get in which form the dataset needs to be, so that it’s fine for the pretrained model. Especially I don’t get how to give the labels correctly for training.

Thanks for the help.

Topic		Replies	Views
GPT-Neo text vs text_target for Seq2Seq Task Models	0	445	October 24, 2022
How to fine-tune GPT on my own data for text generation Beginners	0	2188	January 17, 2022
Fine tuning and retokenizing Beginners	0	589	May 29, 2022
Fine-tune, or train from scratch? Beginners	6	3454	September 16, 2020
How can I continue to train my fine-tuned model with new datasets? 🤗Transformers	0	372	September 14, 2023

Which form needs the dataset to be for finetuning GPT-Neo?

Related topics