Top performer for image classification

As of today, which model is the absolute best and most accurate for fine-tuning with a custom dataset for NSFW image classification across a few labels?

1 Like

If detection is all that is required, ViT may be sufficient. If detailed information needs to be extracted, an approach using a multimodal model such as JoyCaption could also be considered.

@John6666, i dont need extraction. I dont even need detection to identify parts, i just need a super accurate label for the image as a whole. Im looking for 98% accuracy. I have about 40k images per label. Which ViT model would you recommend?

1 Like