In this tutorial (Processing the data - Hugging Face Course), they pass a collection of inputs into Tokenizer.tokenize(). But when I try to pass a list of texts, I get an error, and the source code (transformers/tokenization_utils.py at main 路 huggingface/transformers 路 GitHub) suggests it only accepts one input. So how do I tokenize lots of items at once as they seem to do in the tutorial?
Related topics
Topic | Replies | Views | Activity | |
---|---|---|---|---|
Text Input Sequence Error | 2 | 1137 | October 11, 2023 | |
Tokenization: different results when tokenizing in one pass vs sample-by-sample | 3 | 1759 | October 23, 2023 | |
How can we pass a list of strings to a fine tuned bert model? | 0 | 507 | August 18, 2022 | |
Trying to use AutoTokenizer with TensorFlow gives: `ValueError: text input must of type `str` (single example), `List[str]` (batch or single pretokenized example) or `List[List[str]]` (batch of pretokenized examples).` | 11 | 20340 | October 5, 2024 | |
Tokenize a batch of data | 0 | 164 | May 1, 2023 |