Can't iterate through the data loader object after dynamic padding

I want to do text generation and I’m trying to use gpt-2 model and tokenizer for that purpose.

I am the stage of adding dynamic padding to the dataset using data loader but can’t iterate through the data loader object after dynamic padding is added (i’m assuming) and it is giving this error:

Unable to create tensor, you should probably activate truncation and/or padding with 'padding=True' 'truncation=True' to have batched tensors with the same length.

when I try to iterate it via this line of code:

for step, batch in enumerate(train_dataloader):
  print(batch)
  if step>5:
    break

Full code:

import pandas as pd
from datasets import load_dataset
from transformers import GPT2TokenizerFast
from transformers import DataCollatorWithPadding
from torch.utils.data import DataLoader
import torch

raw_datasets = load_dataset("csv", data_files="dataset.csv", sep=";")
raw_datasets

raw_train_datasets = raw_datasets["train"]
raw_train_datasets

checkpoint = "gpt2"
tokenizer = GPT2TokenizerFast.from_pretrained("gpt2")
tokenizer.pad_token = tokenizer.eos_token

def tokenize_function(example):
    return tokenizer(example["sentence"], truncation=True)

tokenized_datasets = raw_train_datasets.map(tokenize_function, batched=True)
tokenized_datasets = tokenized_datasets.remove_columns(["sentence"])
tokenized_datasets = tokenized_datasets.with_format("torch")

data_collator = DataCollatorWithPadding(tokenizer=tokenizer)
train_dataloader = DataLoader(tokenized_datasets["input_ids"], batch_size=16, shuffle=True, collate_fn=data_collator)
train_dataloader

for step, batch in enumerate(train_dataloader):
  print(batch)
  if step>5:
    break

Please help, i’m trying to debug but can’t get anywhere…

@sgugger please help