After some brainstorming, we came with the following project ideas. Would love some feedback/opinion on the same.
ViT + mBART - Multilingual Image Captioning (WIT pre-train)
ViT + mBERT - Multilingual Visual Question Answering (WIT/COCO pre-train, test on VQA/GQA)
Use CLIP/VQGAN for Image Synthesis. A project has been proposed GIF generation. We can train on a different dataset/domain.
Possible modifications of CLIP/VQGAN:
Multilingual CLIP model for image/sentence matching + Image Generation using VQGAN for this dataset.
FashionCLIP- Train a model for Fashion image-text matching and VQGAN, trained to generate dress/shirt/glasses based on description. Dataset is hard to find, I guess.
Scene generation - CLIP + VQGAN, trained on text-scene dataset for scene (movie/landscape) generation.
Alternatively, can do similar things for video+text, following the example of VideoBERT and other transformers, maybe?
Please comment here if you are interested in collaboration. We are a team of 5 as of now (timezone GMT+5:30). Would also love any suggestions to improve these ideas.
@gchhablani
I am interested in VQA project, I have domain experience in both vision and NLP(intermediate).
What kind of expertise you wish to join the iteam if it is not completed?
Thanks
Hi @knilakshan20
We will choose one of these topics after discussing. If you’re up for other projects too, it would be great to have you with us. Once we have finalized a project, we will share another post on the forum. Maybe then you can decide, otherwise.
Hi @Sasikanth
Is there a specific one you are interested in? Which ones do you think are nice? And what can be improved? Any other suggestions that you have?
We will pick one of these and add another post. If there are enough people who reply here, we can pick two projects from here and work on them separately, wdyt?
I am interested in the ViT + mBERT - Multilingual Visual Question Answering project, i have read a paper (a bit old ,published in 2019) related to this(their model is not limited to VQA task only,but we can finetune it), you can look at here: GitHub - facebookresearch/vilbert-multi-task: Multi Task Vision and Language
Hey guys these are great ideas! Would be nice if you could open different threads for different projects and comment there if you want to be a part of that project, this is so that we can keep track of teams and projects. Would be nice to do this before Wednesday
Hi everyone!
Thanks for showing interest. Everyone has different interests/expertise which could be very useful. However, since we only have one week, we can only pursue few ideas.
I am thinking we can pick two of these. I am maintaining this tiny sheet based on which we might be able to divide ourselves into two teams:
Please fill it by tomorrow so the teams and meet and decide on specifics of the project by Wednesday and post them here.
Right now we are a total of 14 people, so I am thinking 7/7 will be ideal, unless some of you aren’t interested in this anymore and have found something better
I find the ideas very interesting and will like to contribute to both the projects, ViT + mBERT - Multilingual Visual Question Answering (WIT/COCO pre-train, test on VQA/GQA) or CLIP/VQGAN for Image Synthesis. I am really interested in collaborating and contributing in this project.
Hi, I would also like to join the image captioning or the VQA Topic, if it is still possible. I have some experience with nlp and tensorflow and I am located in GMT+2