Hello. I’m a beginner here and was really in doubt about CLIP’s and all the finetuned versions of CLIP license. Sentence-transformers clip-ViT-B-32-multilingual-v1 for example has an APACHE 2 license, but is based on OPENAI’s CLIP has a MIT license on github but they state that " Any deployed use case of the model - whether commercial or not - is currently out of scope.". This is really confusing as I need to make use of open source models for a project ( won’t be commercialized probably, but if it will be then this would be a problem? ).
I’m currently working on a project where I need to work with both text + images and perform some similarity searches. CLIP gives me some headache cause I’m really confused whether I can use it for this kind of stuff or not. My other choice ( I don’t know too much about this tho) would be to use two separate models ( One for text embeddings, one for img embeddings) and have them in two different spaces and perform the searches in the appropriate space ( either for text , or for images) and then somehow add them up together, but I’m not sure as to how to proceed.