I’m pretty new to NLP.
I want to finetune the GPT-Neo model (1,3B) link to the model with an custom dataset in which it gets answers or instructions as input and labels are the desired generated text. I use the tokenizer = GPT2Tokenizer.from_pretrained(model_name), but I don’t really get in which form the dataset needs to be, so that it’s fine for the pretrained model. Especially I don’t get how to give the labels correctly for training.
Thanks for the help.