Hi, does anyone can recommend an image to text model that can take an additional text input for adding context prior for generating the caption?
Anyone please? ![]()
Hi, does anyone can recommend an image to text model that can take an additional text input for adding context prior for generating the caption?
Anyone please? ![]()