I wish to add a non-textual condition to the UNet and fine tune stable diffusion. What is the best way to do so? One way I am currently thinking of is to add it to the time embedding and pass it through the UNet. Wondering if there was a better way.
While I’m at it, how can I modify the time embedding? In UNet2DConditionModel
there doesn’t seem to be a straightforward way of doing so.