Fine-tuning image classification with data augmentation using Trainer

Hello

Suppose that I want to fine-tune a Hugging Face ResNet model for image classification.

model = AutoModelForImageClassification.from_pretrained(“microsoft/resnet-50”, num_labels=num_labels)
image_processor = AutoImageProcessor.from_pretrained(“microsoft/resnet-50”)

I also want to perform data augmentation, so I did:

augment_transform = transforms.Compose([
transforms.RandomHorizontalFlip(),
transforms.RandomVerticalFlip(),
transforms.RandomRotation((-45,45)),
])

Then I define a collator function:

def collator_fn(data):
    l = [image_processor(augment_transform(x['image']), do_resize=False, return_tensors="pt")['pixel_values'] for x in data]
    return {
        'pixel_values': torch.cat(l),
        'labels': torch.tensor([x['label'] for x in data])
    }

And use it in a Trainer

trainer = Trainer(
    model=model,
    args=training_args,
    data_collator=collator_fn,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
)

I have the following questions:

  1. Is this the correct way of fine-tuning with data augmentation?
  2. Will the collator_fn be called on eval_dataset?
  3. If I call ‘trainer.evaluate(eval_dataset=test_dataset)’, will the collator_fn be called on test_dataset?
  4. If I don’t want data augmentation on eval_dataset and test_dataset, what should I do?
  5. Or should I define a custom Dataset class that perform both preprocessing and data augmentation?

Thank you