I’m starting to work on zero-shot instance segmentation with CLIP and Detic.
My classes are very precise and I’m wondering if would be possible to explain what each class means to CLIP on a 3-4 lines paragraph.
I tried using classes with descriptive name like “'paper that has been crumpled, torn”, which helps the model to perform. But it would be great to give further details.
Is it possible to do that with CLIP ? If no, does such model exists ?