CLIP like contrastive vision-language models for Spanish with pre-trained text and vision models

Awesome! Let’s define the project :slight_smile:

1 Like