ValueError: Unable to create tensor for 1 dataset but not the other of same type

ollibolli · March 23, 2022, 11:37am

Hi,

I have split 2 datasets of the same type, tweets and labels for sequence classification.
Both I create the exact same way, from pandas datasets.
They have the same columns, texts, labels in pre dataset conversion and later
labels, input_ids and attention_masks.

For one I can call Trainer.train()
but for the other I get this error:
ValueError: Unable to create tensor, you should probably activate truncation and/or padding with 'padding=True' 'truncation=True' to have batched tensors with the same length.

i thought at first it might be due to having longer texts in the one where the error occurs, but the other one actually has a longer max sequence and the mean is about the same 115 characters long.
The min length is exactly the same. 13.
There are no None or nan values.

Can someone point to what this means?
This is the tokenize function I use from the docs:

def tokenize(batch):
    return tokenizer(batch["texts"], padding=True, truncation=True)

edit 1:
hmm, could it be that it’s because of emojis like smiley faces being present?
edit 2:
hmm, no, all emojis removed still the same error.

ollibolli · March 23, 2022, 1:17pm

OMG I am the worst.
The labels in the other dataset were numbers as strings instead of ints.
Changing to int has worked.

Topic		Replies	Views
ValueError: Unable to create tensor, you should probably activate truncation and/or padding with ‘padding=True’ ‘truncation=True’ 🤗Transformers	1	816	November 22, 2023
ValueError: Unable to create tensor, you should probably activate truncation and/or padding with 'padding=True' 'truncation=True' to have batched tensors with the same length 🤗Transformers	4	36834	January 13, 2025
ValueError in using DataCollator: Unable to create tensor, you should probably activate truncation and/or padding with 'padding=True' 'truncation=True' to have batched tensors with the same length 🤗Transformers	1	7607	January 26, 2023
Trainer.train() padding error but it was working before 🤗Transformers	0	479	May 7, 2021
Issues with Data Collator and Tokenizing with NER Datasets 🤗Tokenizers	1	2519	May 9, 2022

ValueError: Unable to create tensor for 1 dataset but not the other of same type

Related topics