Generate GIF reply to English text with VQGAN + CLIP

braadbaart · June 23, 2021, 7:38pm

So this would be a hybrid dialog system combining intent recognition with text-to-image generation. The idea is to train the model to generate visual replies (static or dynamic GIFs, depending on how hard the latter is) to English-language text input. E.g.:

text input: ‘this restaurant is the bomb’
model output: generated picture of exploding restaurant

There are quite a few GIF and text-to-image datasets out there, but we’d need to combine them in a way that the model can be trained. A large part of the work will be to implement VQGAN and CLIP in JAX.

ceyda · June 24, 2021, 5:14am

I’m also interested in this task
I would like to join forces if you are interested. I’m pretty confident in NLP.
I know CLIP, (some of) vqgan model but don’t know Jax.
My github: cceyda
Also see this dataset

valhalla · June 24, 2021, 8:15am

This sounds really cool!

CLIP is already available in JAX transformers/modeling_flax_clip.py at master · huggingface/transformers · GitHub

osanseviero · June 24, 2021, 9:55am

This sounds great!

We’re also working in a text-to-image widget so it could be fun to try it out with the proposed project

https://github.com/huggingface/huggingface_hub/issues/113

ceyda · June 24, 2021, 12:53pm

BTW I think It would be fun to try text → emoji/emoticon (image not unicode)
If we think about all the different renderings of emoticons between platforms there is reasonable variation to generate the ultimate DeepEmoji set. Would be an easier starting point than reaction gifs…ooh could also animate emojis…
Also

aiosym · June 24, 2021, 1:20pm

This proposed project is interesting.
My interest is Computer Vision, but have no experience with JAX so far.
Anyways, I love to learn and work on a project that relates NLP with Computer Vision tasks.
So I would like to join this project as well.
My time zone is Tokyo (GMT+9)

osanseviero · June 24, 2021, 4:08pm

You might want to check out julien-c/reactiongif · Datasets at Hugging Face. It could be useful

ceyda · June 24, 2021, 4:41pm

It looks like the version on datasets is missing the latest gif files update Code to download gifs? · Issue #1 · bshmueli/ReactionGIF · GitHub
But no worries I already grabbed the files & I can make a PR to add those files if there are no license issues… but I’m not sure

vkumaresan · June 24, 2021, 10:51pm

I like this idea a lot and would be interested in joining!

Vaibhavbrkn · June 25, 2021, 3:05am

very interesting, I would love to contribute in this project. I have experience in CV as well as in NLP.

braadbaart · June 25, 2021, 9:06am

hi everyone, thanks for your replies!
I’m very busy at work at the moment, I’ll have a look at the suggestions over the weekend…

Any suggestions on collaboration format are more than welcome. We need to accommodate the different timezones (I’m in CET), but it would be nice to kick off with a (well-prepared) live chat.

tree-park · June 25, 2021, 2:31pm

Awesome! I’d like to contribute on this project. I have experience fine-tuning dialog models.

bharatR · June 25, 2021, 7:59pm

It’s a great and very cool idea, i have experience with vision and nlp both, i have read the paper of “Taming Transformers for High-Resolution Image Synthesis” and used the code also,so i think i could positively contribute in this project .

jaketae · June 26, 2021, 4:34am

Sounds very interesting. I have experience with generative models and fine-tuning LMs, hope I can contribute in some way!

braadbaart · June 27, 2021, 9:41pm

Hey everyone, just to get things started I’ve created this github repository with the comments from the posts: gif-generation/README.md at main · braadbaart/gif-generation · GitHub. I think I found everyone’s github profiles except bharat and osanseviero - send me a message on github if you want access.

We can use the github repository to discuss issues and datasets (through the issues feature), and add implementations. Just add your code / ideas / comments to the repo and we’ll see who ends up doing what. This is of course not the most efficient way to go about, so let’s evaluate if we can settle on a design and split up the work in a more sensible way in the next couple of days…

I have the next four weeks off so I hope to be able to spend some time on this I work mostly on NLP, and CV is quite new to me. Lots of experience with TF/Keras and Pytorch, not so much with JAX.

valhalla · June 28, 2021, 11:14am

Hey guys:)

Here’a JAX implem of VQGAN GitHub - patil-suraj/vqgan-jax: JAX implementation of VQGAN

Feel free to use/modify it for this project.

valhalla · June 28, 2021, 4:28pm

let’s officially define this project

Putting everybody in the official sheet here. More people can still join! Leave a comment here or on the sheet if you want to change something. If there are more than 10 people we will split it into two groups, as managing more than 10 people will be a bit difficult.

shivam15s · June 29, 2021, 4:52am

Hey everyone!
This seems very exciting. I would love to join in and contribute.
I have prior experience in NLP and also been working on some image reconstruction in CV recently.

ceyda · June 30, 2021, 4:47am

On discord I have created a channel named #clip-reply please join there & we can start discussing~

khalidsaifullaah · June 30, 2021, 7:35am

This is one sounds like a super interesting project!

Topic		Replies	Views
Vision-Language Project Ideas Flax/JAX Projects	13	1558	June 30, 2021
CLIP like contrastive vision-language models for German with pre-traind text and vision models Flax/JAX Projects	5	1828	July 4, 2021
CLIP like contrastive vision-language models for Spanish with pre-trained text and vision models Flax/JAX Projects	4	397	June 29, 2021
Train an AudioClip model in Flax/Jax Flax/JAX Projects	6	713	July 4, 2021
Winning Project announcement Flax/JAX Projects	0	952	July 30, 2021

Generate GIF reply to English text with VQGAN + CLIP

Related topics