ValueError: Unable to create tensor for 1 dataset but not the other of same type

Hi,

I have split 2 datasets of the same type, tweets and labels for sequence classification.
Both I create the exact same way, from pandas datasets.
They have the same columns, texts, labels in pre dataset conversion and later
labels, input_ids and attention_masks.

For one I can call Trainer.train()
but for the other I get this error:
ValueError: Unable to create tensor, you should probably activate truncation and/or padding with 'padding=True' 'truncation=True' to have batched tensors with the same length.

i thought at first it might be due to having longer texts in the one where the error occurs, but the other one actually has a longer max sequence and the mean is about the same 115 characters long.
The min length is exactly the same. 13.
There are no None or nan values.

Can someone point to what this means?
This is the tokenize function I use from the docs:

def tokenize(batch):
    return tokenizer(batch["texts"], padding=True, truncation=True)

edit 1:
hmm, could it be that it’s because of emojis like smiley faces being present?
edit 2:
hmm, no, all emojis removed still the same error.

OMG I am the worst.
The labels in the other dataset were numbers as strings instead of ints.
Changing to int has worked.