Programmatic way to Tokenization on Custom Text Columns

courtneysprouse131 · June 27, 2022, 3:13pm

I’m having issues with tokenizing in a more programmatic way. I can’t seem to figure out a way to pass multiple arguments to the map function in tokenizers. I also can’t run the map function on a particular column in the dataset. And I can’t use the generic python map function because I can’t figure out how to make it return a transformers dataset object (although that could be my own ignorance). Any thoughts?

Topic		Replies	Views
Multi class classification Not able to tokenize using map on custom datset Beginners	0	200	August 30, 2021
How to tokenize using map 🤗Datasets	4	6241	April 14, 2021
Custom Dataset with Custom Tokenizer 🤗Datasets	3	788	June 23, 2021
Tokenizer performance is slow, after call to dataset map 🤗Datasets	0	173	June 15, 2024
Encode_plus Pretokenized input seuqence must be Union 🤗Tokenizers	0	460	November 21, 2022

Programmatic way to Tokenization on Custom Text Columns

Related topics