I have trouble understanding the following lines of code from the file /src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py#L692-L694
if do_classifier_free_guidance:
noise_pred_uncond, noise_pred_text = noise_pred.chunk(2)
noise_pred = noise_pred_uncond + guidance_scale * (noise_pred_text - noise_pred_uncond)
I get the part that when we sample without negative prompts, noise_pred_uncond
as the name suggested is an unconditional distribution, and the conditional noise difference (noise_pred_text - noise_pred_uncond)
prodivdes “guidance” for the sampling process based on positive prompts.
However when we sample with negative prompts, noise_pred_uncond
becomes a conditional distribution of the negative prompts according to the implementation:
if do_classifier_free_guidance and negative_prompt_embeds is None:
uncond_tokens: List[str]
if negative_prompt is None:
uncond_tokens = [""] * batch_size
elif type(prompt) is not type(negative_prompt):
raise TypeError(
f"`negative_prompt` should be the same type to `prompt`, but got {type(negative_prompt)} !="
f" {type(prompt)}."
)
elif isinstance(negative_prompt, str):
uncond_tokens = [negative_prompt]
elif batch_size != len(negative_prompt):
raise ValueError(
f"`negative_prompt`: {negative_prompt} has batch size {len(negative_prompt)}, but `prompt`:"
f" {prompt} has batch size {batch_size}. Please make sure that passed `negative_prompt` matches"
" the batch size of `prompt`."
)
else:
uncond_tokens = negative_prompt
I don’t really get the part where you add the guidance to a negative-prompts-conditional distribution. Why don’t add the guidance between +ve and -ve to an unconditional instead? Shouldn’t we worry that the final image will possess features that described by the negative prompts?
Thanks!