Hi ! batch_size in map() is only to batch inside the map operation - this doesn’t batch the output of the dataset once you iterate on it. If you want your data loader to yield batches, you should pass batch_size to the data loader:
train_dl = DataLoader(
train_dataset.map(
collate_fn, batched=True, batch_size=10, remove_columns=["url", "short_caption", "caption"]
),
batch_size=10
)