Image captioning for Spanish with pre-trained vision and text model For this project, a pre-trained image model like ViT can be used as an encoder, and a pre-trained text model like BERT and/or GPT2 can be used as a decoder. Model Pre-trained ViT , BERT models can be found on the model hub. We cou…

Image captioning for Spanish with pre-trained vision and text model

mrm8488 June 23, 2021, 3:54pm 3

So you want to create a CLIP like (dual encoder) model? @valhalla

1 Like

Topic		Replies	Views
Image captioning for Japanese with pre-trained vision and text model Flax/JAX Projects	0	1173	June 23, 2021
Image captioning for French with pre-trained vision and text model Flax/JAX Projects	6	2161	January 4, 2022
Image captioning for Indonesia with pre-trained vision Flax/JAX Projects	4	486	June 29, 2021
CLIP like contrastive vision-language models for Spanish with pre-trained text and vision models Flax/JAX Projects	4	397	June 29, 2021
Multilingual Image Captioning Flax/JAX Projects	10	1286	July 6, 2021