Postprocess using CLIP

Hello, I wonder if there are some ways to postprocess synthesized images with CLIP. For example given a prompt “A photo of a chair”, sometimes the generated images are not consistent with the prompt (eg a twisted chair", so I believe it’s helpful to postprocess using some existing models like CLIP. Basically we can generate 10 images and only keep top 5 images with the highest scores with the text prompt.
I wonder if such functionality is already implmented. Thanks.

Hi @cnut1648, that’s certainly possible! It’s not implemented in the diffusers library, but you can do it yourself in exactly the way you are describing. You can start by taking a look at the CLIP usage documentation and go from there. Please, let us know if you have any questions :slight_smile: