Failling fine-tuning OWL-ViT

thaiza · November 11, 2022, 1:40am

Hi, I am trying to fine-tune OWL-ViT model based on a personal dataset since the current model is not finding the bound boxes I need.

I used the following code and I am consistently getting the following error:
(yes, I am aware that I passed the same data for validation and test in Trainer object, but that was literally for testing a hello world for OWL-ViT fine-tuning)

The code:

import transformers
from transformers import AutoFeatureExtractor
from transformers import AutoTokenizer

import requests
from PIL import Image
import torch

from transformers import CLIPProcessor, CLIPModel, CLIPTokenizer
from datasets import load_dataset
from transformers import OwlViTProcessor, OwlViTForObjectDetection, OwlViTFeatureExtractor
from transformers import TrainingArguments, Trainer
from torchvision.transforms import Compose, Normalize, RandomResizedCrop, ColorJitter, ToTensor
import numpy as np
from datasets import load_metric


feature_extractor = OwlViTFeatureExtractor.from_pretrained("google/owlvit-base-patch32")

processor = OwlViTProcessor.from_pretrained("google/owlvit-base-patch32")

tokenizer = AutoTokenizer.from_pretrained("google/owlvit-base-patch32")

model = OwlViTForObjectDetection.from_pretrained("google/owlvit-base-patch32")

######### PREPROCESS DATASET ############

normalize = Normalize(mean=feature_extractor.image_mean, std=feature_extractor.image_std)
_transforms = Compose(
    [RandomResizedCrop(feature_extractor.size), ColorJitter(brightness=0.5, hue=0.5), ToTensor(), normalize]
)

# Create the dataset based on structure of the folder

dataset_valves = load_dataset("imagefolder", data_dir="../my_dataset_2")

# The function applies the normalization of images and the passes the text to the OwLVitProcessor

def pre_process_dataset(example):
    return processor(images=_transforms(example['image'].convert("RGB")), text=[str(example['label'])])

dataset_valves_ = dataset_valves['train']
preprocessed_datasets = dataset_valves_.map(pre_process_dataset) 
preprocessed_datasets2 = preprocessed_datasets.remove_columns(['image','label']) #remove the unnecessary columns

######### PREPARE AND RUN TRAINER ############

metric = load_metric("accuracy")

def compute_metrics(eval_pred):
    logits, labels = eval_pred
    predictions = np.argmax(logits, axis=-1)
    return metric.compute(predictions=predictions, references=labels)

training_args = TrainingArguments(output_dir="test_trainer", evaluation_strategy="epoch")

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=preprocessed_datasets2 ,
    eval_dataset=preprocessed_datasets2 ,
    compute_metrics=compute_metrics,
)

trainer.train()

And the error:
RuntimeError: Expected 3D (unbatched) or 4D (batched) input to conv2d, but got input of size: [1, 1, 3, 768, 768]

Could anyone, please, help me?

@Neil46 @adirik

ps: I tried to apply the following tutorial but adapted for OWV-ViT: Google Colab

thaiza · November 12, 2022, 2:38pm

@Neil46 @adirik

adirik · November 15, 2022, 12:47pm

Hi @thaiza, thanks for the question!

OWL-ViT uses a bipartite matching loss introduced in DETR but the loss terms are implemented yet. I can take a look at your code but you can also expect to see the training/fine-tuning code and official tutorial shortly.

Hope this helps

xray1111 · February 20, 2023, 9:01am

Hi @adirik , I’m also very interested in this topic. May I ask for about how long will you release the finetuning code of OWL-ViT?

kopyl · April 11, 2023, 6:04pm

Also very interested. Can’t wait for it to use

Topic		Replies	Views
Owl-vit training on custom dataset from scratch Beginners	0	553	December 7, 2023
Pytorch Error when fine-tuning "google/vit-base-patch16-224-in21k" on video datasets with the Huggingface Trainer API Models	0	162	April 13, 2024
Can't Load ViT Model for Fine Tuning 🤗Transformers	2	1504	August 11, 2022
Using Owl ViT Embeddings with cosine similarity 🤗Transformers	1	561	February 15, 2024
Multi_class_classification errors when fine-tuning via TrainerAPI Beginners	0	374	February 20, 2023

Failling fine-tuning OWL-ViT

Related topics