I am currently trying to implement image classification using I-JEPA. The paper from Yann Lecun ([2111.06377] Masked Autoencoders Are Scalable Vision Learners) mentioned that it could be applied for image classification, which piqued my interest in exploring its usage for the same. However, I’m facing a bit of confusion when it comes to actual implementation.
From the repository provided on GitHub, I am finding it hard to understand how to modify the model to add a linear classifier. More so, I am unclear about how to re-train the model on my data. The pre-trained models available on their GitHub are also present, but I must admit, I am finding it difficult to grasp how to leverage these for my purpose.
Could anyone who has some experience with I-JEPA help me understand the process? Any guidance on how to adapt the model for image classification, and potentially how to use the pre-trained models, would be extremely appreciated.
Looking forward to your suggestions and guidance. Thanks in advance!