Image captioning for Spanish with pre-trained vision and text model

Hello @valhalla @patrickvonplaten ,
Hope you are doing well…
I am interested in this project as well… So much to learn from doing this project…It is similar to CLIP. But in Clip, we use encoders for both image & text… In this project, we are gonna use encoder for image & decoder for text.
Also, I have a suggestion that we could create a separate thread indexed with all JAX/FLAX learning resources together in 1 place…It will be useful for all of us to learn…
Cheers,
Sri Lakshmi

1 Like