Using map take 7,2x times longer than set_transform

Hello! I am working in a vision use case. I am using the processor:

from transformers import ViTImageProcessor
processor = ViTImageProcessor.from_pretrained('google/vit-base-patch16-224')

When I use the map function:

def apply_processor(example):
  example['pixel_values'] = processor(example['image'].convert("RGB"), return_tensors="pt").pixel_values.squeeze()
  return example

processed_dataset = pet_dataset.map(apply_processor)
processed_dataset.set_format("torch")

trainer = Trainer(
    model,
    args,
    train_dataset=processed_dataset['train'],
    eval_dataset=processed_dataset['validation'],
    data_collator=collate_fn,
    compute_metrics=compute_metrics,
    tokenizer=processor,
)

It takes 3 hours to complete 5 epochs. But when I use set_transform:

def transform_processor(batch):
  batch['pixel_values'] = [processor(image.convert("RGB"), return_tensors="pt").pixel_values.squeeze() for image in batch['image']]
  return batch
pet_dataset.set_transform(transform_processor)

import torch

trainer = Trainer(
    model,
    args,
    train_dataset=pet_dataset['train'],
    eval_dataset=pet_dataset['validation'],
    data_collator=collate_fn,
    compute_metrics=compute_metrics,
    tokenizer=processor,
)

It takes 25 minutes to complete the same 5 epochs. Why is this happening? Intuitively I thought that using map would take shorter because it applies the processor before training, and with transform it supposes that does that on-fly.