Custom Dataset with Custom Tokenizer

isarth · June 23, 2021, 12:18pm

I trained a BPE tokenizer using the wiki-text and now I’m trying to use this tokenizer on a custom dataset from a csv file. What I want to achieve is additional of feature columns in my dataset. But the dataset.map is giving error.

sgugger · June 23, 2021, 12:26pm

You should just use the tokenizer __call__: tokenizer(example["text"]).

isarth · June 23, 2021, 12:38pm

When I train the tokenizer using this Quicktour — tokenizers documentation
The call function is not implemented

sgugger · June 23, 2021, 12:50pm

Oh, you should wrap your tokenizer in a PreTrainedTokenizerFact from the Transformers library (you can just pass your tokenizer with the tokenizer_object keyword argument).

Topic		Replies	Views
Tokenizer is not defined 🤗Transformers	5	11208	March 19, 2024
Programmatic way to Tokenization on Custom Text Columns 🤗Tokenizers	0	568	June 27, 2022
How to tokenize using map 🤗Datasets	4	6232	April 14, 2021
Multi class classification Not able to tokenize using map on custom datset Beginners	0	200	August 30, 2021
Save tokenizer with argument 🤗Tokenizers	2	1965	October 26, 2022

Custom Dataset with Custom Tokenizer

Related topics