Hello! I am working in a vision use case. I am using the processor:
from transformers import ViTImageProcessor
processor = ViTImageProcessor.from_pretrained('google/vit-base-patch16-224')
When I use the map function:
def apply_processor(example):
example['pixel_values'] = processor(example['image'].convert("RGB"), return_tensors="pt").pixel_values.squeeze()
return example
processed_dataset = pet_dataset.map(apply_processor)
processed_dataset.set_format("torch")
trainer = Trainer(
model,
args,
train_dataset=processed_dataset['train'],
eval_dataset=processed_dataset['validation'],
data_collator=collate_fn,
compute_metrics=compute_metrics,
tokenizer=processor,
)
It takes 3 hours to complete 5 epochs. But when I use set_transform:
def transform_processor(batch):
batch['pixel_values'] = [processor(image.convert("RGB"), return_tensors="pt").pixel_values.squeeze() for image in batch['image']]
return batch
pet_dataset.set_transform(transform_processor)
import torch
trainer = Trainer(
model,
args,
train_dataset=pet_dataset['train'],
eval_dataset=pet_dataset['validation'],
data_collator=collate_fn,
compute_metrics=compute_metrics,
tokenizer=processor,
)
It takes 25 minutes to complete the same 5 epochs. Why is this happening? Intuitively I thought that using map would take shorter because it applies the processor before training, and with transform it supposes that does that on-fly.