Image to Text model that can take an additional text as input for context

Anyone please? :slight_smile: