Using map take 7,2x times longer than set_transform

Diegulio · November 15, 2023, 1:19am

Hello! I am working in a vision use case. I am using the processor:

from transformers import ViTImageProcessor
processor = ViTImageProcessor.from_pretrained('google/vit-base-patch16-224')

When I use the map function:

def apply_processor(example):
  example['pixel_values'] = processor(example['image'].convert("RGB"), return_tensors="pt").pixel_values.squeeze()
  return example

processed_dataset = pet_dataset.map(apply_processor)
processed_dataset.set_format("torch")

trainer = Trainer(
    model,
    args,
    train_dataset=processed_dataset['train'],
    eval_dataset=processed_dataset['validation'],
    data_collator=collate_fn,
    compute_metrics=compute_metrics,
    tokenizer=processor,
)

It takes 3 hours to complete 5 epochs. But when I use set_transform:

def transform_processor(batch):
  batch['pixel_values'] = [processor(image.convert("RGB"), return_tensors="pt").pixel_values.squeeze() for image in batch['image']]
  return batch
pet_dataset.set_transform(transform_processor)

import torch

trainer = Trainer(
    model,
    args,
    train_dataset=pet_dataset['train'],
    eval_dataset=pet_dataset['validation'],
    data_collator=collate_fn,
    compute_metrics=compute_metrics,
    tokenizer=processor,
)

It takes 25 minutes to complete the same 5 epochs. Why is this happening? Intuitively I thought that using map would take shorter because it applies the processor before training, and with transform it supposes that does that on-fly.

Topic		Replies	Views
Should I use .map(processor) or define tokenizer=processor? 🤗Transformers	0	172	November 7, 2023
Image dataset with_transform not applied Beginners	1	101	July 25, 2024
Slow processing with map when using deepspeed or fairscale 🤗Datasets	10	3650	June 25, 2021
.map() function extremely slow 🤗Datasets	1	1329	September 13, 2023
How to Train a Model with Pytorch Lightning with Huggingface 🤗Datasets	1	1147	April 4, 2024

Using map take 7,2x times longer than set_transform

Related topics