I’m planning to train the stable-diffusion-2-inpainting
model using my own dataset consisting of image.jpg
and mask.jpg
pairs. I understand that the input should have 9 channels. Could you provide guidance on how to preprocess these image and mask files and structure them correctly for training? Specifically:
- What is the expected tensor structure for the 9-channel input?
- How should I combine my image and mask files to conform to this structure?
- Are there any specific preprocessing steps or code examples available?
Any guidance or example code would be very helpful.