Image captioning for Spanish with pre-trained vision and text model

Hi,
I would like to join this project. I am not experienced enough with NLP to create a project on my own. But I do have experience with Computer Vision overall, and would like to try and see how well spanish text is generated for the image-captioning.

I am David and I am in the Amsterdam timezone(CET). I have some experience with other frameworks, and training/evaluating object detection networks, but not with NLP. I can contribute by putting in my previous knowledge and discipline towards achieving this goal :slight_smile: