I’ve seen this great space by @nielsr: Comparing Captioning Models - a Hugging Face Space by nielsr. It includes GIT and BLIP models.
text-to-image pipeline only supports these models, which are based on a different architecture.
Are there any plans to support GIT/CLIP models in the
I’m trying to build a feature in open-source library based on this pipeline and it would be great to switch to these more modern and performant models in the future…