Hello Hugging Face Community,
I am facing an issue with my fine-tuned InstructPix2Pix
model on S.D 1.5. The model was trained to modify blank 2D floorplans (input) into floorplans with colored circles (ground truth/target). My dataset consists of ~540 unique image pairs, and I successfully loaded it locally with prompts for each image.
During training, validation outputs showed gradual improvement up to around 50 epochs, where the model started generating circles on validation images. However, after 50 epochs, the output began to replicate the input image instead of learning the desired transformation.
Training Parameters
Here are the key parameters I used during training
"accelerate", "launch", "--mixed_precision=fp16", "train_instruct_pix2pix.py",
--pretrained_model_name_or_path ${MODEL_NAME}
--train_data_dir ${DATASET_ID}
--resolution 256
--random_flip
--train_batch_size 4
--gradient_accumulation_steps 4
--gradient_checkpointing
--num_train_epochs 50
--checkpointing_steps 500
--checkpoints_total_limit 1
--lr_scheduler constant
--max_grad_norm 1
--lr_warmup_steps 0
--val_image_url ###
--validation_prompt "Mark plant positions on this 2D floorplan with colored circles. Place large red pots in open spaces, paired only with orange. Groups of three emphasize triangular setups using red, blue, and orange. Avoid straight-line groupings or escalating sizes."
--report_to wandb
--conditioning_dropout_prob 0.05
--seed 42
Steps I Have Tried
- Lowered learning rate (e.g.,
5e-05
) as well as higher learning rate (2e-04) - Used both
constant
andlinear
learning rate schedulers - Experimented with different CFG weights during inference
Unfortunately, the issue persists: during inference, the model outputs the same input image rather than transforming it as per the training.
Inference Script:
from diffusers import StableDiffusionInstructPix2PixPipeline, DDPMScheduler
from PIL import Image
import torch
# Load the pipeline
repo_path = "/home/ec2-user/SageMaker/diffusers/examples/instruct_pix2pix/instruct-pix2pix-model_LR_linear"
pipe = StableDiffusionInstructPix2PixPipeline.from_pretrained(repo_path, torch_dtype=torch.float16)
# Use the DDPMScheduler (used in training)
pipe.scheduler = DDPMScheduler.from_config(pipe.scheduler.config)
# Move the pipeline to GPU
device = "cuda"
pipe = pipe.to(device)
# Load and preprocess the input image
image_path = "/home/ec2-user/SageMaker/diffusers/examples/instruct_pix2pix/dataset/test/FP_NP_ 111.png"
image = Image.open(image_path).convert("RGB")
image = image.resize((512, 512))
# Define the prompt and inference parameters
prompt = "recreate the floorplan with coloured circles on it"
guidance_scale = 7.5
num_inference_steps = 50
seed = 42
# Set up the generator for reproducibility
generator = torch.Generator(device).manual_seed(seed)
# Run inference
try:
images = pipe(
prompt=prompt,
image=image, # Include the conditioning image
guidance_scale=guidance_scale,
num_inference_steps=num_inference_steps,
generator=generator
).images
# Save the output image
output_path = "output_image_ddpm.png"
images[0].save(output_path)
print(f"Image saved to: {output_path}")
except RuntimeError as e:
print(f"Error during inference: {e}")
My Question
How can I prevent the model from simply replicating the input image and ensure it learns the transformation from the ground-truth/target images during inference? Are there any suggestions for parameter tuning or model adjustments that might help?
Thank you for your time and guidance!