I have a question about giving Image condition at diffusion models

Hello, I’m just a student and I am now trying to train image image-conditioned diffusion model. I am seeing most of the diffusion models using concatenation to utilize image conditions. So I wondered if I didn’t wanna use the concatenation so I simply added the condition image (so the input range becomes (0,1) to (0,2) because I normalize (0,255) to (0,1)) and subtracted after estimation to keep the estimated value range(0,1). Amazingly it acts like a normal image condition model but still looks like very very very degraded results than my given condition. Why does it work? Are there any good papers about giving conditions??