Please read the topic category description to understand what this is all about
Description
One of the most exciting developments in 2021 was the release of OpenAI’s CLIP model, which was trained on a variety of (text, image) pairs. One of the cool things you can do with this model is use it for text-to-image and image-to-image search (similar to what is possible when you search for images on your phone).
The goal of this project is to experiment with CLIP and learn about multimodal models. Several ideas can be explored, including:
Create a text-to-image search engine that allows users to search for images based on natural language queries. Although CLIP was only trained for English text, you can use techniques like Multilingual Knowledge Distillation to extend the embeddings to new languages
Create an image-to-image search engine that returns similar images, given a “query” image.
A common dataset that’s used for image demos is the Unsplash Dataset. You can get access to it here
Challenges
This project goes beyond that concepts introduced in Part II of the Course, so some familiarity with computer vision would be useful. Having said that, the Transformers API is similar for image tasks, so if you know how the pipeline() function works, then you’ll have no trouble adapting to this new domain.
Desired project outcomes
Create a Streamlit or Gradio app on Spaces that allows a user to find images that resemble a natural language query or input image.
Don’t forget to push all your models and datasets to the Hub so others can build on them!
Awesome, 4 people already! You can head over to Discord if you want to coordinate / chat etc
I’ve added the project name in this topic’s description
Hey @RobotJelly, I think we already have 4 people in this project (the team limit), so you can either join this similar project or work on this one by yourself
Hi @RobotJelly the only constraint is that we’re reserving the Amazon SageMaker compute for teams, so if you have your own GPUs / cloud provider, then your more than welcome to work on this project by yourself
oh ok @lewtun actually i dont have any cloud provider service so hmm… actually i dont have idea that team is really needed & thats why i’ve submitted the form also with this project title
If those aren’t an option for you, I recommend checking through the #course:course-event category and seeing if there’s an idea that interests you and doesn’t have anyone signed up (just check the comments).
Alternatively you are more than welcome to propose a project of your own!
@RobotJelly Lets do together the image search project as a second group? Is that ok @lewtun ? Or should we select a complete new project? This relates to an idea I have connected to my current job.
I think it’s okay to have a second group working on this I have create a team for you and @RobotJelly. If anyone else join you, the code is use-openais-clip-for-image-search-group2 (when filling the name of the project in the AWS form).
Hi @marcelcastrobr@sgugger , this sounds like a very interesting project, can I also join your second team on use-openais-clip-for-image-search-group2 project as well?