Hey everyone, I 'm new here. Hope you guys don’t mind answering basic questions.
I’ve been exploring NSFW AI detection lately, and it’s been a pretty fascinating rabbit hole. Tools like NSFWJS are great for quick setups, and the CLIP-based NSFW Detector is super impressive with how it uses embeddings to classify content.
Recently, I came across this site called soulfun.ai (which is all about creative AI stuff including ai generated photos and videos), and it got me thinking: how can I fine-tune these models for more niche or specific datasets?
I’ve been playing around with a basic CLIP setup, and here’s a quick snippet of what I’ve tried so far:
from transformers import CLIPProcessor, CLIPModel
import torch
# Load the pre-trained CLIP model
model = CLIPModel.from_pretrained("openai/clip-vit-base-patch32")
processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32")
# Set up inputs
inputs = processor(text=["NSFW", "SFW"], images=image, return_tensors="pt", padding=True)
# Forward pass
outputs = model(**inputs)
logits_per_image = outputs.logits_per_image # Scores for image-text similarity
probs = logits_per_image.softmax(dim=1) # Probabilities for each class
# Check if NSFW
is_nsfw = probs[0][0] > 0.5
Please let know, for those of you who’ve fine-tuned a CLIP-based model for NSFW (or even something similar):
- What kind of datasets worked best for you?
- Did you use any specific tricks during training to improve accuracy?
- Any tips for keeping the model fast and lightweight during inference?
Would love to hear what’s worked for you! Thanks in advance for any advice.