I’m using CLIPModel
to train a CLIP on my dataset from scratch.
I set up my model like this:
configuration = CLIPConfig() # use ViT-B/32 openAI's default clip model
model = CLIPModel(configuration)
I realize the default trainer is not suitable for CLIP. So I set up a customized trainer like this:
class CLIPTrainer(Trainer):
def compute_loss(self, model, inputs, return_outputs=False):
captions = inputs['captions']
pixel_values = inputs['pixel_values']
outputs = model(captions, pixel_values, return_dict=True, return_loss=True)
return (outputs, outputs.loss) if return_outputs else outputs.loss
However, in evaluation step, I have the following error message:
Traceback (most recent call last):
File "run_clip.py", line 239, in <module>
main()
File "run_clip.py", line 233, in main
metrics = trainer.evaluate()
File "/root/miniconda3/lib/python3.8/site-packages/transformers/trainer.py", line 3029, in evaluate
output = eval_loop(
File "/root/miniconda3/lib/python3.8/site-packages/transformers/trainer.py", line 3210, in evaluation_loop
loss, logits, labels = self.prediction_step(model, inputs, prediction_loss_only, ignore_keys=ignore_keys)
File "/root/miniconda3/lib/python3.8/site-packages/transformers/trainer.py", line 3467, in prediction_step
loss = loss.mean().detach()
AttributeError: 'CLIPOutput' object has no attribute 'mean'
So I figure there is something wrong with the prediction_step
function in my trainer, and I rewrote it to:
class CLIPTrainer(Trainer):
def compute_loss(
self, model: nn.Module,
inputs: Dict[str, Union[torch.Tensor, Any]],
return_outputs: bool = False):
captions = inputs['captions']
pixel_values = inputs['pixel_values']
outputs = model(captions, pixel_values, return_dict=True, return_loss=True)
return (outputs, outputs.loss) if return_outputs else outputs.loss
def prediction_step(
self,
model: nn.Module,
inputs: Dict[str, Union[torch.Tensor, Any]],
prediction_loss_only: bool,
ignore_keys: Optional[List[str]] = None,
) -> Tuple[Optional[torch.Tensor], Optional[torch.Tensor], Optional[torch.Tensor]]:
captions = inputs["captions"]
pixel_values = inputs["pixel_values"]
with torch.no_grad():
outputs = model(captions, pixel_values, return_dict=True, return_loss=True)
return outputs.loss if prediction_loss_only else outputs
I don’t know if that is the right thing to do.
And I have the following questions:
In my training_args
I set up the eval_step to 100, I don’t know if it means to use the validationset to do evaluation every 100 step.
If it does, why I only get the said error message after the whole training process is done?
I’m new to huggingface, thank you guys.