SageMakerConfig object has no attribute gpu_ids

Hello everyone,

I am trying to run the Dreambooth training example from the diffusers repo, on Sagemaker: diffusers/examples/dreambooth at main · huggingface/diffusers · GitHub

However I am getting the following error:

AttributeError: 'SageMakerConfig' object has no attribute 'gpu_ids'.

Here are the flags I am using:

export MODEL_NAME="CompVis/stable-diffusion-v1-4"
export INSTANCE_DIR="s3://instance-images"
export OUTPUT_DIR="/opt/ml/model"

accelerate launch train_dreambooth.py \
  --aws_access_key_id="XXXXXXXXXX" \
  --aws_secret_access_key="XXXXXXXX" \
  --pretrained_model_name_or_path=$MODEL_NAME  \
  --instance_data_dir=$INSTANCE_DIR \
  --output_dir=$OUTPUT_DIR \
  --with_prior_preservation --prior_loss_weight=1.0 \
  --instance_prompt="a photo of sks cat" \
  --class_prompt="a photo of cat" \
  --resolution=512 \
  --train_batch_size=1 \
  --gradient_accumulation_steps=1 \
  --learning_rate=5e-6 \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --num_class_images=200 \
  --max_train_steps=800

and here is the output of my accelerate config file:

base_job_name: accelerate-sagemaker-1
compute_environment: AMAZON_SAGEMAKER
distributed_type: 'NO'
ec2_instance_type: ml.p3.2xlarge
iam_role_name: accelerate_sagemaker_execution_role
image_uri: null
mixed_precision: FP16
num_machines: 1
profile: null
py_version: py38
pytorch_version: 1.10.2
region: us-east-1
sagemaker_inputs_file: null
sagemaker_metrics_file: null
transformers_version: 4.17.0
use_cpu: false

Any clues on what I could be doing wrong?

To zoom out, what I am trying to achieve is fine-tune Dreambooth on Sagemaker and save the resulting artifacts on S3.

Thank you for your time and assistance! :hugs:

Whats your version of accelerate?

Version: 0.14.0.dev0

Thanks @jadechip, can you try using accelerate installed via:

pip install git+https://github.com/huggingface/accelerate@fix-config

please? Thank you! :slight_smile:

1 Like

@muellerzr Thank you so much for your help with this! The version you mention, and also version accelerate==0.12.0 both worked, so there might be something wrong with the 0.14.0.dev0 build.

1 Like

This has now been solved on main :slight_smile: thanks for your patience!