Model for classifying one large object based on small details that are on the object


I have a dataset consisting of large images that contain one large single object, this object has small details on it and i want to classify this object into 6 different classes based on the small details that the object has on it. I do not know where exactly the small details are in the large object but they are visible. I’m looking for a model/vision transformer that can help me classify it, I’ve tried using Resnet50 and the results i got were not that good.
Any suggestions which model i should use?