Hi, does anyone can recommend an image to text model that can take an additional text input for adding context prior for generating the caption?
Anyone please?
Hi, does anyone can recommend an image to text model that can take an additional text input for adding context prior for generating the caption?
Anyone please?