Trying to use AutoTokenizer with TensorFlow gives: `ValueError: text input must of type `str` (single example), `List[str]` (batch or single pretokenized example) or `List[List[str]]` (batch of pretokenized examples).`

anon39499327 · September 8, 2024, 5:20pm

def tokenize(batch):
texts = [str(text) for text in batch[“text”]] # convert all to str
return tokenizer(texts, padding=True, truncation=True)

emotions_encoded = emotions.map(tokenize, batched=True, batch_size=None)

IT WORKS!!

Topic		Replies	Views
Cannot encode/tokenize my Dataset Dictionary Beginners	1	1084	August 19, 2021
Text Input Sequence Error 🤗Transformers	2	1140	October 11, 2023
Error in https://huggingface.co/learn/llm-course/chapter3/2?fw=pt#preprocessing-a-dataset 🤗Datasets	3	13	September 4, 2025
ValueError: Unable to create tensor for 1 dataset but not the other of same type 🤗Tokenizers	1	995	March 23, 2022
Using TFBertTokenizer with tf.data.Dataset 🤗Transformers	3	304	March 10, 2024