Hello!
I’ve seen this great space by @nielsr: Comparing Captioning Models - a Hugging Face Space by nielsr. It includes GIT and BLIP models.
Currently, text-to-image
pipeline only supports these models, which are based on a different architecture.
Are there any plans to support GIT/CLIP models in the text-to-image
pipeline?
I’m trying to build a feature in open-source library based on this pipeline and it would be great to switch to these more modern and performant models in the future…