I’m planning to train the
stable-diffusion-2-inpainting model using my own dataset consisting of
mask.jpg pairs. I understand that the input should have 9 channels. Could you provide guidance on how to preprocess these image and mask files and structure them correctly for training? Specifically:
- What is the expected tensor structure for the 9-channel input?
- How should I combine my image and mask files to conform to this structure?
- Are there any specific preprocessing steps or code examples available?
Any guidance or example code would be very helpful.