Do I need to specify the prediction_step in my customized trainer?

I’m using CLIPModel to train a CLIP on my dataset from scratch.
I set up my model like this:

configuration = CLIPConfig()     # use ViT-B/32 openAI's default clip model
model = CLIPModel(configuration)

I realize the default trainer is not suitable for CLIP. So I set up a customized trainer like this:

    class CLIPTrainer(Trainer):
        def compute_loss(self, model, inputs, return_outputs=False):
            captions = inputs['captions']
            pixel_values = inputs['pixel_values']
            outputs = model(captions, pixel_values, return_dict=True, return_loss=True)
            return (outputs, outputs.loss) if return_outputs else outputs.loss

However, in evaluation step, I have the following error message:

Traceback (most recent call last):
  File "run_clip.py", line 239, in <module>
    main()
  File "run_clip.py", line 233, in main
    metrics = trainer.evaluate()
  File "/root/miniconda3/lib/python3.8/site-packages/transformers/trainer.py", line 3029, in evaluate
    output = eval_loop(
  File "/root/miniconda3/lib/python3.8/site-packages/transformers/trainer.py", line 3210, in evaluation_loop
    loss, logits, labels = self.prediction_step(model, inputs, prediction_loss_only, ignore_keys=ignore_keys)
  File "/root/miniconda3/lib/python3.8/site-packages/transformers/trainer.py", line 3467, in prediction_step
    loss = loss.mean().detach()
AttributeError: 'CLIPOutput' object has no attribute 'mean'

So I figure there is something wrong with the prediction_step function in my trainer, and I rewrote it to:

    class CLIPTrainer(Trainer):
        def compute_loss(
                self, model: nn.Module, 
                inputs: Dict[str, Union[torch.Tensor, Any]], 
                return_outputs: bool = False):
            captions = inputs['captions']
            pixel_values = inputs['pixel_values']
            outputs = model(captions, pixel_values, return_dict=True, return_loss=True)
            return (outputs, outputs.loss) if return_outputs else outputs.loss

        def prediction_step(
            self,
            model: nn.Module,
            inputs: Dict[str, Union[torch.Tensor, Any]],
            prediction_loss_only: bool,
            ignore_keys: Optional[List[str]] = None,
        ) -> Tuple[Optional[torch.Tensor], Optional[torch.Tensor], Optional[torch.Tensor]]:
            captions = inputs["captions"]
            pixel_values = inputs["pixel_values"]

            with torch.no_grad():
                outputs = model(captions, pixel_values, return_dict=True, return_loss=True)
            return outputs.loss if prediction_loss_only else outputs

I don’t know if that is the right thing to do.
And I have the following questions:
In my training_args I set up the eval_step to 100, I don’t know if it means to use the validationset to do evaluation every 100 step.
If it does, why I only get the said error message after the whole training process is done?
I’m new to huggingface, thank you guys.

1 Like