Hi, I am reading this page about Clip score used to evaluate a text-to-image model: Evaluating Diffusion Models. I notice that the clip score calculated is larger than 1, while in the literature, it is usually reported around 0.3. Do I just divide the clip score by 100 to match the scale in the literature?
1 Like