Image dataset with_transform not applied

dimitrije-it · July 22, 2024, 11:44pm

Hi,

I am training a computer vision model and want to apply AutoImageProcessor to prepare images for model.

When I use with_transform and the use a trainer, transformation is not applied.

Dataset contains PIL images.

from transformers import AutoImageProcessor
image_processor = AutoImageProcessor.from_pretrained('google/vit-base-patch16-224-in21k', use_fast=True)

def transform(example):
    ds = {}
    ds['image'] = image_processor(example, return_tensors='pt')['pixel_values'].reshape(3,224,224)
    return ds

dataset = dataset.with_transform(transform)

When I use .map it works, but I have a large dataset and map is taking too much space.

Any ideas why with_transform is not called at all? I also have tried with DataLoader and transformation is not applied.

qubvel-hf · July 25, 2024, 4:40pm

Hi @dimitrije-it , with_transform works fine for the following snippet:

import datasets
print("Datasets version:", datasets.__version__)

from transformers import AutoImageProcessor
from datasets import load_dataset

image_processor = AutoImageProcessor.from_pretrained('google/vit-base-patch16-224-in21k', use_fast=True)
dataset = load_dataset("zh-plus/tiny-imagenet")["train"]

def transform(example):
    inputs = image_processor(example["image"], return_tensors='pt')['pixel_values'].reshape(3,224,224)
    return {
        "image": inputs,
        "label": example["label"]
    }

dataset = dataset.with_transform(transform)
print(dataset[0])

Datasets version: 2.20.0
{'image': tensor([[ 1.0000,  1.0000,  1.0000,  ...,  0.1373, -0.0039, -0.0039],
        [ 1.0000,  1.0000,  1.0000,  ...,  0.1373, -0.0039, -0.0039],
        [ 1.0000,  1.0000,  1.0000,  ...,  0.1294, -0.0118, -0.0118],
        ...,
        [-0.2863, -0.2863, -0.2863,  ..., -0.4039, -0.4118, -0.4118],
        [-0.2863, -0.2863, -0.2863,  ..., -0.4275, -0.4353, -0.4353],
        [-0.2863, -0.2863, -0.2863,  ..., -0.4275, -0.4353, -0.4353]]), 'label': 0}

Topic		Replies	Views
Change image transformations during training 🤗AutoTrain	0	21	February 10, 2025
Using map take 7,2x times longer than set_transform 🤗Transformers	0	190	November 15, 2023
How to use Trainer with Vision Transformer Beginners	3	1692	October 19, 2021
Using load_dataset.set_transform() function along with Trainer class 🤗Datasets	4	2603	April 26, 2021
Why use `val_transforms()` function in image classification example instead of `feature_extractor`? Intermediate	0	386	July 4, 2022

Image dataset with_transform not applied

Related topics