Can you provide an example of best practices for incorporating a pretrained HuggingFace Vision Transformer (ViT) into a PyTorch Lightning module?

While there are numerous examples and notebooks showing how to run and fine-tune pretrained models like Vision Transformers (ViT), I’m looking for a clear example of how to integrate a pretrained ViT into a PyTorch Lightning pipeline. Specifically:

  • Should I instantiate the AutoImageProcessor() class within my pl.LightningModule, or would it be better to do so in my pl.DataModule?
  • Should I implement my own forward method in the LightningModule, or should I simply call the forward method of the pretrained model (which would be an attribute of my Lightning class)?