Image to text model that can take an additional text input

nadnadoni1234 · September 7, 2023, 9:05am

Hi, does anyone can recommend an image to text model that can take an additional text input for adding context prior for generating the caption?

Sandy1857 · October 2, 2023, 4:21pm

I think you can look at Matcha or Deplot models. You could pass in a text along with the image, but I doubt it will have any significant effect on the output, though they were trained with a text-input as well.

Topic		Replies	Views
Image to Text model that can take an additional text as input for context 🤗Hub	1	510	September 5, 2023
Image Captioning fine tuning 🤗Transformers	0	453	February 25, 2023
Image captioning for Japanese with pre-trained vision and text model Flax/JAX Projects	0	1186	June 23, 2021
CLIP Image to Text search Beginners	0	907	December 19, 2022
Inference Api free rate limit Inference Endpoints on the Hub	0	1960	May 20, 2023

Image to text model that can take an additional text input

Related topics