Passing list of inputs to tokenize

pooplepeople · May 4, 2022, 4:52am

In this tutorial (Processing the data - Hugging Face Course), they pass a collection of inputs into Tokenizer.tokenize(). But when I try to pass a list of texts, I get an error, and the source code (transformers/tokenization_utils.py at main · huggingface/transformers · GitHub) suggests it only accepts one input. So how do I tokenize lots of items at once as they seem to do in the tutorial?

Topic		Replies	Views
Text Input Sequence Error 🤗Transformers	2	1137	October 11, 2023
Tokenization: different results when tokenizing in one pass vs sample-by-sample Intermediate	3	1759	October 23, 2023
How can we pass a list of strings to a fine tuned bert model? 🤗Transformers	0	507	August 18, 2022
Trying to use AutoTokenizer with TensorFlow gives: `ValueError: text input must of type `str` (single example), `List[str]` (batch or single pretokenized example) or `List[List[str]]` (batch of pretokenized examples).` 🤗Tokenizers	11	20340	October 5, 2024
Tokenize a batch of data Models	0	164	May 1, 2023