Tiny fine tune messed up the pretrained sd v1-4 model

plumblossom · April 4, 2023, 5:24am

newbie to diffusers, i’m following the instruction in diffusers/examples/text_to_image at main · huggingface/diffusers · GitHub

made some small change in training script and inference

training

export MODEL_NAME='CompVis/stable-diffusion-v1-4'

export dataset_name="lambdalabs/pokemon-blip-captions"
# using CPU
accelerate launch --mixed_precision="no"  train_text_to_image.py \
  --pretrained_model_name_or_path=$MODEL_NAME \
  --dataset_name=$dataset_name \
  --use_ema \
  --resolution=512 --center_crop --random_flip \
  --train_batch_size=1 \
  --gradient_accumulation_steps=4 \
  --gradient_checkpointing \
  --max_train_steps=10 \
  --learning_rate=1e-03 \
  --max_grad_norm=1 \
  --lr_scheduler="constant" --lr_warmup_steps=0 \
  --output_dir="sd-pokemon-model"

inference

pipe = StableDiffusionPipeline.from_pretrained(model_path, safety_checker=None, requires_safety_checker=False)
pipe = pipe.to("cpu")
# Recommended if your computer has < 64 GB of RAM
pipe.enable_attention_slicing()

however, the 10-step continuously training made the model totally messy, the inference just returned something pure noise

question: with pokemon data, even only 10 steps training, the result became totally messy, anything incompatible for fine tune?

btw, changed to 0 steps, everything works fine, the inference returns the good result as pretrained one, so the model loading and saving parts are good.

thanks!

sayakpaul · April 6, 2023, 3:20am

Could you provide the prompts you used for comparison and the images you got out from the prompts?

0 training step means that there was no fine-tuning. It might so have happened that the pre-trained model generated a plausible image w.r.t the input prompt you had given.

plumblossom · April 8, 2023, 12:41am

i’m using random prompts, e.g. “yoda”, “a warrior on horse”, etc.
the issue here is that, after loading the pretrained model, just a few steps (even with small learning rate in a small dataset) would mess up the entire model, that looks unexpected. i doubt any config or format issue.

Topic		Replies	Views
Smaller pretrained models for Stable Diffusion? 🧨 Diffusers	4	3008	September 26, 2022
Discrepancies between CompVis and Diffuser fine-tuning? 🧨 Diffusers	5	1191	November 7, 2022
Dreambooth finetuning does yield expected result Beginners	0	81	June 13, 2024
Running stable diffusion models Beginners	1	655	December 23, 2023
Diffusers text-to-image finetuning example fails on multi-node 🧨 Diffusers	2	698	March 30, 2023

Tiny fine tune messed up the pretrained sd v1-4 model

Related topics