Predicting images based on a sentence, the unsupervised way

vincentclaes · June 3, 2022, 2:26pm

I have a use case where I want to predict an image based on some text.
For the images, I have the image itself + tags.

An example use case is that you start with images of emojis + tags of those emoji’s.
The result would be that the model suggests 1 or more emojis based on some sentence that the user provides.

I do not have any training data. The idea is that I embed the images + tags into a model and test with sentences.

What would be a good starting point for embedding the images + tags in a model?
I was thinking about multimodal modeling using Perceiver IO? Transformers-Tutorials/Perceiver_for_Multimodal_Autoencoding.ipynb at master · NielsRogge/Transformers-Tutorials · GitHub

Any other suggestions?

Thanks. Vincent

Topic		Replies	Views
Image/tag retrieval system Beginners	0	602	December 13, 2022
Image captioning for Japanese with pre-trained vision and text model Flax/JAX Projects	0	1183	June 23, 2021
Steps to train T5 on collections of tags Research	0	678	June 1, 2022
Need to create tags out of text Beginners	2	754	April 19, 2024
Questions regarding the multi-modal setup on MM-Imbd dataset Beginners	0	297	June 14, 2021

Predicting images based on a sentence, the unsupervised way

Related topics