Hi 
I am trying to create a custom model using the transforms library, specifically the ViT model.
So to have some benchmark, my goal is to replicate the ViTForImageClassification class, only adding a linear layer on top of the ViTModel output.
But I experience some problems during the evaluation phase when I used my ViTCustom model. If I set the argument evaluation_stage="no" inside the function TrainingArguments(), the model update the parameters, and the training loss starts to decrease and reach similar levels that comparing with the ViTForImageClassification (good, sanity check). But, when I set evaluation_strategy="steps" the model returns the following error  :
 :
/usr/local/lib/python3.7/dist-packages/transformers/optimization.py:310: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
  FutureWarning,
***** Running training *****
  Num examples = 1451
  Num Epochs = 4
  Instantaneous batch size per device = 16
  Total train batch size (w. parallel, distributed & accumulation) = 16
  Gradient Accumulation steps = 1
  Total optimization steps = 364
 [101/364 00:30 < 01:20, 3.28 it/s, Epoch 1.10/4]
Step	Training Loss	Validation Loss
***** Running Evaluation *****
  Num examples = 170
  Batch size = 8
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-59-5bdd51e9c139> in <module>
----> 1 train_results = trainer.train()
      2 trainer.save_model()
      3 trainer.log_metrics("train", train_results.metrics)
      4 trainer.save_metrics("train", train_results.metrics)
      5 trainer.save_state()
8 frames
/usr/local/lib/python3.7/dist-packages/transformers/trainer_pt_utils.py in nested_detach(tensors)
    158     if isinstance(tensors, (list, tuple)):
    159         return type(tensors)(nested_detach(t) for t in tensors)
--> 160     return tensors.detach()
    161 
    162 
AttributeError: 'NoneType' object has no attribute 'detach' 
Does anyone that faced some similar issue have an idea why the ViTCustom model returns a âNoneTypeâ on evaluation mode that raises an error trying to detach the tensors? 
best,
Cristóbal