Dreambooth not generating model_index.json and thus is not able to make inference

Hello, I am using Dreambooth with the runwayml/stable-diffusion-v1-5 model and after executing it I am getting the following folders:

  • logs
  • safety_checker
  • scheduler
  • text_encoder
  • tokenizer
  • unet
  • vae

Nothing else is being generated. Thus, when trying the make an inference the following error is obtained:

OSError: Error no file named model_index.json found in directory

The most interesting thing is that the pipeline that I had was working perfectly before upgrading the diffusers library. Thus, I assume that something has changed and now the flow training-inference is not working. I really appreciate any help with this issue. Thanks.

Hello @SrLozano!

I’m not able to reproduce this problem, a model_index.json file is being correctly generated for me.

Could you please post the exact command line invocation you are using, as well as the output from diffusers-cli env?

Pedro, thank you for taking the time to respond. I appreciate it.

The exact command line invocation I am using is the following:

export MODEL_NAME="runwayml/stable-diffusion-v1-5"
export INSTANCE_DIR="/zhome/d1/6/diffusers/examples/dreambooth/images"
export OUTPUT_DIR="/zhome/d1/6/diffusers/examples/dreambooth/saved_model"

accelerate launch train_dreambooth.py \
  --pretrained_model_name_or_path=$MODEL_NAME  \
  --instance_data_dir=$INSTANCE_DIR \
  --output_dir=$OUTPUT_DIR \
  --instance_prompt="a photo of sks dog" \
  --resolution=512 \
  --train_batch_size=1 \
  --gradient_accumulation_steps=1 \
  --learning_rate=5e-6 \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --max_train_steps=400

On the other hand, the output from diffusers-cli env is:

--diffusers version: 0.17.0.dev0
--Platform: Linux-3.10.0-1160.88.1.el7.x86_64-x86_64-with-glibc2.17
--Python version: 3.8.13
--PyTorch version (GPU?): 1.13.1+cu117 (False)
--Huggingface_hub version: 0.14.1
--Transformers version: 4.28.1
--Accelerate version: 0.18.0
--xFormers version: not installed
--Using GPU in script?: Yes, I am using an A100 from a computing cluster
--Using distributed or parallel set-up in script?: No

I notice the error when I updated version 0.13.1 to 0.17.0. I have a script that automatically makes inferences to generate images from a model trained using Dreambooth. Everything was working smoothly. But I notice an error I realised that dreambooth is not longer outputing the same files it used to. I am using the recommended pipeline for inference.

from diffusers import DiffusionPipeline
import torch

model_id = "path_to_saved_model"
pipe = DiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16).to("cuda")

prompt = "A photo of sks dog in a bucket"
image = pipe(prompt, num_inference_steps=50, guidance_scale=7.5).images[0]

image.save("dog-bucket.png")

Everything’s very similar to my setup, and in my case the model_info.json file is indeed saved. Could you maybe open an issue in GitHub and reference this post so we can get more eyes on it?

Sure! I’ll do it! Thank you very much for your time Pedro.

Having same issue here.

- `diffusers` version: 0.17.0.dev0
- Platform: Linux-5.15.0-69-generic-x86_64-with-glibc2.27
- Python version: 3.10.9
- PyTorch version (GPU?): 2.0.1+cu117 (True)
- Huggingface_hub version: 0.14.1
- Transformers version: 4.29.2
- Accelerate version: 0.19.0
- xFormers version: not installed
- Using GPU in script?: 4090
- Using distributed or parallel set-up in script?: No

training `DreamBooth` using `Stable-Diffusion-v1-5`

The link is Dreambooth not generating model_index.json and thus is not able to make inference · Issue #3468 · huggingface/diffusers · GitHub apparently.

I read two more issues about the same problem which are just unstructured support requests and abridged solutions unfortunately and could extract the following theory:

The model_id passed to DiffusionPipeline.from_pretrained(...) is not the directory as you might think following the docs, but the identifier of the remote model which needs to be pushed to huggingface after training and downloaded to be used, i.e. there’s no local approach.

Maybe you can patch the files from the local cache ~/.cache/huggingface or maybe this is all wrong, I’m just compensating the lack of comprehensive, clear docs with speculation, trial and error.