I trained a BPE tokenizer using the wiki-text and now I’m trying to use this tokenizer on a custom dataset from a csv file. What I want to achieve is additional of feature columns in my dataset. But the dataset.map is giving error.
You should just use the tokenizer __call__
: tokenizer(example["text"])
.
When I train the tokenizer using this Quicktour — tokenizers documentation
The call function is not implemented
Oh, you should wrap your tokenizer in a PreTrainedTokenizerFact
from the Transformers library (you can just pass your tokenizer with the tokenizer_object
keyword argument).
1 Like