Can I pass a text file to the tokenizer?

kuttersn July 1, 2022, 10:07am 1

Do I have to use the dataset library to be able to fine-tune gpt2? if yes, should I add the special tokens to the file before passing it to the tokenizer?

1 Like

Topic		Replies	Views
GPT-2 Data Preparation for Parsing Trees Intermediate	0	124	May 6, 2024
How did the dataset manages long sentences? 🤗Datasets	1	985	February 15, 2022
Help understanding how to build a dataset for language as with the old TextDataset 🤗Datasets	7	12719	October 6, 2021
Using Tokenizer for integer data 🤗Tokenizers	0	531	January 3, 2023
Training GPT-2 from scratch Beginners	2	1230	August 3, 2020

Can I pass a text file to the tokenizer?

Related topics