NSFW for image detection

How to improve f1-score and accuracy of the NSFW image detection Model, given that the current performance is only 80% even with the Fine-tuned one?

Is there performance lagging when one image at a time Vs combining images and scoring them?