How to use an image tensor for caption generation with Transformer-XL or BERT?

gibbo0789 · September 1, 2020, 12:38pm

I am fairly new to transformers and deep learning in general so please be kind,

I am currently working on a project that will caption images using either Transformer-XL or BERT, however, I am not sure how to pass the image tensor that is [608, 608, 3] from my CNN to the transformer model for text generation, can anyone help?

Please feel free to ask questions, I would be glad to assist in any way I can.

abhineet99 · November 27, 2020, 5:58am

Guess I’m late. Although I’m not an expert, I can give you some idea. You can use some network like ResNet, DenseNet to ‘encode’ the image into a 1-D tensor, and then use this tensor to generate captions using a transformer.

Topic		Replies	Views
Img2seq model with pretrained weights Beginners	7	1215	November 18, 2021
Does it make sense to generate sentences with Transofmrer's encoder? Research	0	380	May 22, 2021
Question on text input in image captioning Beginners	0	268	December 4, 2022
Image Captioning - ViT + BERT with WIT Flax/JAX Projects	2	4078	October 21, 2021
How to use the generation_utils.generate? 🤗Transformers	0	282	April 28, 2022

How to use an image tensor for caption generation with Transformer-XL or BERT?

Related topics