I am trying to finetune the stable diffusion 2.1 model on a custom dataset. I am using the example script provided here. The training starts alright, the loss is decreasing, but then randomly, the loss becomes nan, and the model starts to output black images. It is not an issue with the dataset as this once happened after an entire epoch (so it had iterated over all samples). This is completely random, I once got it to train for 1800 steps before running into the problem, while on average it takes around 200 steps.
I have tried 2 different datasets. One was a custom dataset, of 768x786 images, 2000 data samples. The other one was this fashion dataset from kaggle. I used a subset of around 1800 images from this.
I have tried batch sizes upto 4. I have also tried training with and without the xformers flag.
Any ideas what the problem could be?
The command I am using. (The training script is unchanged)
accelerate launch train_text_to_image.py --pretrained_model_name_or_path="stabilityai/stable-diffusion-2-1" --train_data_dir="dataset\dataset_fashion" --resolution=512 --center_crop --train_batch_size=1 --gradient_accumulation_steps=1 --gradient_checkpointing --max_train_steps=10000 --learning_rate=1e-6 --max_grad_norm=0.5 --lr_scheduler="constant" --output_dir="output" --checkpointing_steps=200 --enable_xformers_memory_efficient_attention
- GPU: 24GB Titan RTX
- Platform: Windows-10-10.0.19044-SP0
- Python version: 3.9.16
- PyTorch version (GPU?): 1.13.1+cu116 (True)
- Huggingface_hub version: 0.13.3
- Transformers version: 4.27.3
- Accelerate version: 0.18.0
- xFormers version: 0.0.16
- Using GPU in script?: Yes
- Using distributed or parallel set-up in script?: No