I tokenize the dataset as follows:
cola = datasets.load_dataset(‘linxinyuan/cola’)
cola_tokenized = cola.map(lambda examples: tokenizer(examples[‘text’], padding=True, truncation=True), batched=True, batch_size=16)
however, if do not pass tokenizer=cola_tokenizer to Trainer arguments I get an error about tensors size mismatches.
Why do I need to pass the tokenizer to Trainer if my data is already tokenized?