How to finetune a vision model using custom datasets?

I try to use the trainer in transformers to finetune a model, but I don’t know how to input the data into the model, there is only an example of NLP, Can anyone show me a example about training using custom image datasets?

I try to input the self-implmented class, got "

ValueError: The batch received was empty, your model won’t be able to train on it. Double-check that your training dataset contains keys expected by the model: pixel_values, labels,…

"

pixel_values seems to be a feature set, I assume - from the limited information you have given, it’s an array of arrays, containing x, y locations and the corresponding color.

Usually you don’t create that yourself, you use some dataset preparation function to extract those features from a given image.

But as you neither named the model, nor gave us any further information than an error message, this is hard to reply :wink:

My model is

vinvino02/glpn-nyu

It is the monocular depth estimation model, input is RGB image, output is depth map. So my code is below

  class depth_class(torch.utils.data.Dataset):
    def exr2npy(self,path):
        import OpenEXR,Imath
        exr_file = OpenEXR.InputFile(path)
        depth_str = exr_file.channel("R", Imath.PixelType(Imath.PixelType.FLOAT))
        width = exr_file.header()["dataWindow"].max.x - exr_file.header()["dataWindow"].min.x + 1
        height = exr_file.header()["dataWindow"].max.y - exr_file.header()["dataWindow"].min.y + 1
        depth_data = np.frombuffer(depth_str, dtype=np.float32).reshape((height,width))
        return depth_data


    def __init__(self, folder_id):
        rgb_path = Path('../Desktop/record_pulpwood/_{}_h_re/EXR_RGBD/rgb'.format(folder_id))
        d_path = Path('../Desktop/record_pulpwood/_{}_h_re/EXR_RGBD/depth'.format(folder_id))
        rgb_=sorted(rgb_path.files('*.jpg'),key=extract_number)
        d_= sorted(d_path.files('*.exr'),key=extract_number)
        self.img_path = rgb_
        self.depth_path = d_
        
    def __getitem__(self, idx):
        img_ = Image.open(self.img_path[idx])
        img = image_processor(img_, return_tensors="pt").pixel_values
        depth = self.exr2npy(self.depth_path[idx])
        depth = torch.tensor(depth)
        # return {'pixel_values': img, 'label': depth}
        return img, depth
    def __len__(self):
        return len(self.img_path)
    training_dataset=depth_class(58)
    test_dataset = depth_class(58+1)
...
...
    from transformers import TrainingArguments, Trainer
    from datasets import Features,Dataset
    checkpoint = "vinvino02/glpn-nyu"
    image_processor = AutoImageProcessor.from_pretrained(checkpoint)
    model = AutoModelForDepthEstimation.from_pretrained(checkpoint)
    training_args = TrainingArguments(output_dir="test_trainer", evaluation_strategy="epoch")
    trainer = Trainer(
        model=model,
        args=training_args,
        train_dataset=training_dataset,
        eval_dataset=test_dataset,
        compute_metrics=eval_depth,
    )
    trainer.train()
    model.save_pretrained('./glp_test')
    eval_results = trainer.evaluate()
    print(eval_results)

What should I do if I want to run to pass the depth map into the Feature?