Generate GIF reply to English text with VQGAN + CLIP

So this would be a hybrid dialog system combining intent recognition with text-to-image generation. The idea is to train the model to generate visual replies (static or dynamic GIFs, depending on how hard the latter is) to English-language text input. E.g.:

text input: ‘this restaurant is the bomb’
model output: generated picture of exploding restaurant

There are quite a few GIF and text-to-image datasets out there, but we’d need to combine them in a way that the model can be trained. A large part of the work will be to implement VQGAN and CLIP in JAX.

9 Likes

I’m also interested in this task :grinning_face_with_smiling_eyes:
I would like to join forces if you are interested. I’m pretty confident in NLP.
I know CLIP, (some of) vqgan model but don’t know Jax.
My github: cceyda
Also see this dataset

2 Likes

This sounds really cool!

CLIP is already available in JAX :slight_smile: transformers/modeling_flax_clip.py at master · huggingface/transformers · GitHub

2 Likes

This sounds great!

We’re also working in a text-to-image widget so it could be fun to try it out with the proposed project :slight_smile:

https://github.com/huggingface/huggingface_hub/issues/113

1 Like

BTW I think It would be fun to try text → emoji/emoticon (image not unicode)
If we think about all the different renderings of emoticons between platforms there is reasonable variation to generate the ultimate DeepEmoji set. Would be an easier starting point than reaction gifs…ooh could also animate emojis…
Also :hugs: :wink:

3 Likes

This proposed project is interesting.
My interest is Computer Vision, but have no experience with JAX so far.
Anyways, I love to learn and work on a project that relates NLP with Computer Vision tasks.
So I would like to join this project as well.
My time zone is Tokyo (GMT+9)

You might want to check out julien-c/reactiongif · Datasets at Hugging Face. It could be useful :smiley:

2 Likes

It looks like the version on datasets is missing the latest gif files update :blush: Code to download gifs? · Issue #1 · bshmueli/ReactionGIF · GitHub
But no worries I already grabbed the files & I can make a PR to add those files if there are no license issues… but I’m not sure

1 Like

I like this idea a lot and would be interested in joining!

very interesting, I would love to contribute in this project. I have experience in CV as well as in NLP.

hi everyone, thanks for your replies! :slight_smile:
I’m very busy at work at the moment, I’ll have a look at the suggestions over the weekend…

Any suggestions on collaboration format are more than welcome. We need to accommodate the different timezones (I’m in CET), but it would be nice to kick off with a (well-prepared) live chat.

Awesome! I’d like to contribute on this project. I have experience fine-tuning dialog models.

It’s a great and very cool idea, i have experience with vision and nlp both, i have read the paper of “Taming Transformers for High-Resolution Image Synthesis” and used the code also,so i think i could positively contribute in this project :grinning:.

Sounds very interesting. I have experience with generative models and fine-tuning LMs, hope I can contribute in some way!

Hey everyone, just to get things started I’ve created this github repository with the comments from the posts: gif-generation/README.md at main · braadbaart/gif-generation · GitHub. I think I found everyone’s github profiles except bharat and osanseviero - send me a message on github if you want access.

We can use the github repository to discuss issues and datasets (through the issues feature), and add implementations. Just add your code / ideas / comments to the repo and we’ll see who ends up doing what. This is of course not the most efficient way to go about, so let’s evaluate if we can settle on a design and split up the work in a more sensible way in the next couple of days…

I have the next four weeks off so I hope to be able to spend some time on this :slight_smile: I work mostly on NLP, and CV is quite new to me. Lots of experience with TF/Keras and Pytorch, not so much with JAX.

3 Likes

Hey guys:)

Here’a JAX implem of VQGAN GitHub - patil-suraj/vqgan-jax: JAX implementation of VQGAN

Feel free to use/modify it for this project.

3 Likes

let’s officially define this project :slight_smile:

Putting everybody in the official sheet here. More people can still join! Leave a comment here or on the sheet if you want to change something. If there are more than 10 people we will split it into two groups, as managing more than 10 people will be a bit difficult.

Hey everyone!
This seems very exciting. I would love to join in and contribute.
I have prior experience in NLP and also been working on some image reconstruction in CV recently.

1 Like

On discord I have created a channel named #clip-reply please join there & we can start discussing~

2 Likes

This is one sounds like a super interesting project!