As of today, which model is the absolute best and most accurate for fine-tuning with a custom dataset for NSFW image classification across a few labels?
1 Like
If detection is all that is required, ViT may be sufficient. If detailed information needs to be extracted, an approach using a multimodal model such as JoyCaption could also be considered.
@John6666, i dont need extraction. I dont even need detection to identify parts, i just need a super accurate label for the image as a whole. Im looking for 98% accuracy. I have about 40k images per label. Which ViT model would you recommend?
1 Like