Hello, I am following this tutorial here: notebooks/tokenizer_training.ipynb at master · huggingface/notebooks · GitHub So, using this code, I add my custom dataset: from datasets import load_dataset dataset = load_dataset('csv', data_files=['/content/drive/MyDrive/mydata.csv']) Then, I use this…

Unable to use custom dataset when training a tokenizer

anon58275033 August 11, 2021, 5:00pm 3

Okay, thanks for that. I have trained my own tokenizer from scratch, so how do I use it in the masked language task?

Topic		Replies	Views
KeyError: "Invalid key: slice(0, 1000, None). Please first select a split Beginners	3	3233	September 11, 2023
Cannot encode/tokenize my Dataset Dictionary Beginners	1	1084	August 19, 2021
How to tokenize using map 🤗Datasets	4	6276	April 14, 2021
Issues with Trainer class on custom dataset 🤗Transformers	3	7400	June 14, 2023
Custom Dataset with Custom Tokenizer 🤗Datasets	3	789	June 23, 2021