Hell everyone!
I’ve been trying to finetune a GPT based model using SFT trainer from TRl library. it is my understanding that you could directly pass a non-tokenized dataset and the SFT trainer class handles the tokenization internally.
However after defining my dataset(with only one row named “text”), the error “you should provide a list of encodings but you have provided none” is raised.
what could be the problem here??
Intermediate #TRL #SFTTrainer
See my answer here: Fine tune with SFTTrainer - #8 by nielsr