I’m new to generative AI and recently learned about diffusion models. My goal is to create a fine-tuned Image-To-Image diffusion model that can transfer images to another category (like a cartoon) while maintaining the original image’s details, such as the face or body shape.
After researching how diffusion models work, I believe that adding uneven noise to the image (e.g., 100% noise for the background, 25% noise for the face, and 50% noise for the body) and fine-tuning the model to my desired output image type should do the trick.
However, I’m unsure of how to start. I have a small dataset of 100-200 sample images of the output style I want to achieve, but I don’t know how to fine-tune the model. Specifically, I’m unsure of which pre-trained model to use for this task and how to send an input image with uneven noise. So far, all the models I’ve seen require an Image and mask, and I don’t want to use a black-and-white mask to preserve details from the original image.
Any guidance on these topics would be greatly appreciated.