Output generates Input Image During Inference After Fine-Tuning InstructPix2Pix on Floorplans

viraj9837 · December 30, 2024, 7:18pm

Hello Hugging Face Community,

I am facing an issue with my fine-tuned InstructPix2Pix model on S.D 1.5. The model was trained to modify blank 2D floorplans (input) into floorplans with colored circles (ground truth/target). My dataset consists of ~540 unique image pairs, and I successfully loaded it locally with prompts for each image.

During training, validation outputs showed gradual improvement up to around 50 epochs, where the model started generating circles on validation images. However, after 50 epochs, the output began to replicate the input image instead of learning the desired transformation.

Training Parameters

Here are the key parameters I used during training

"accelerate", "launch", "--mixed_precision=fp16", "train_instruct_pix2pix.py",
--pretrained_model_name_or_path ${MODEL_NAME}
--train_data_dir ${DATASET_ID}
--resolution 256
--random_flip
--train_batch_size 4
--gradient_accumulation_steps 4
--gradient_checkpointing
--num_train_epochs 50
--checkpointing_steps 500
--checkpoints_total_limit 1
--lr_scheduler constant
--max_grad_norm 1
--lr_warmup_steps 0
--val_image_url ###
--validation_prompt "Mark plant positions on this 2D floorplan with colored circles. Place large red pots in open spaces, paired only with orange. Groups of three emphasize triangular setups using red, blue, and orange. Avoid straight-line groupings or escalating sizes."
--report_to wandb
--conditioning_dropout_prob 0.05
--seed 42

Steps I Have Tried

Lowered learning rate (e.g., 5e-05) as well as higher learning rate (2e-04)
Used both constant and linear learning rate schedulers
Experimented with different CFG weights during inference

Unfortunately, the issue persists: during inference, the model outputs the same input image rather than transforming it as per the training.

Inference Script:

from diffusers import StableDiffusionInstructPix2PixPipeline, DDPMScheduler
from PIL import Image
import torch


# Load the pipeline
repo_path = "/home/ec2-user/SageMaker/diffusers/examples/instruct_pix2pix/instruct-pix2pix-model_LR_linear"
pipe = StableDiffusionInstructPix2PixPipeline.from_pretrained(repo_path, torch_dtype=torch.float16)

# Use the DDPMScheduler (used in training)
pipe.scheduler = DDPMScheduler.from_config(pipe.scheduler.config)

# Move the pipeline to GPU
device = "cuda"
pipe = pipe.to(device)

# Load and preprocess the input image
image_path = "/home/ec2-user/SageMaker/diffusers/examples/instruct_pix2pix/dataset/test/FP_NP_ 111.png"
image = Image.open(image_path).convert("RGB")
image = image.resize((512, 512))

# Define the prompt and inference parameters
prompt = "recreate the floorplan with coloured circles on it"
guidance_scale = 7.5
num_inference_steps = 50
seed = 42

# Set up the generator for reproducibility
generator = torch.Generator(device).manual_seed(seed)

# Run inference
try:
    images = pipe(
        prompt=prompt,
        image=image,  # Include the conditioning image
        guidance_scale=guidance_scale,
        num_inference_steps=num_inference_steps,
        generator=generator
    ).images

    # Save the output image
    output_path = "output_image_ddpm.png"
    images[0].save(output_path)
    print(f"Image saved to: {output_path}")
except RuntimeError as e:
    print(f"Error during inference: {e}")

My Question

How can I prevent the model from simply replicating the input image and ensure it learns the transformation from the ground-truth/target images during inference? Are there any suggestions for parameter tuning or model adjustments that might help?

Thank you for your time and guidance!

Topic		Replies	Views
Tiny fine tune messed up the pretrained sd v1-4 model 🧨 Diffusers	2	504	April 8, 2023
Error for Fine-Tune SD2 for Inpainting 🧨 Diffusers	0	314	January 26, 2023
Instructpix2pix training guide please 🧨 Diffusers	2	343	May 30, 2023
Generate low contrast images after training instruct pix2pix Intermediate	0	261	August 16, 2023
Testing img2img model on a mug...🧐 Models	0	252	August 12, 2023

Output generates Input Image During Inference After Fine-Tuning InstructPix2Pix on Floorplans

Training Parameters

Steps I Have Tried

My Question

Related topics