SageMakerConfig object has no attribute gpu_ids

jadechip · October 9, 2022, 7:38am

Hello everyone,

I am trying to run the Dreambooth training example from the diffusers repo, on Sagemaker: diffusers/examples/dreambooth at main · huggingface/diffusers · GitHub

However I am getting the following error:

AttributeError: 'SageMakerConfig' object has no attribute 'gpu_ids'.

Here are the flags I am using:

export MODEL_NAME="CompVis/stable-diffusion-v1-4"
export INSTANCE_DIR="s3://instance-images"
export OUTPUT_DIR="/opt/ml/model"

accelerate launch train_dreambooth.py \
  --aws_access_key_id="XXXXXXXXXX" \
  --aws_secret_access_key="XXXXXXXX" \
  --pretrained_model_name_or_path=$MODEL_NAME  \
  --instance_data_dir=$INSTANCE_DIR \
  --output_dir=$OUTPUT_DIR \
  --with_prior_preservation --prior_loss_weight=1.0 \
  --instance_prompt="a photo of sks cat" \
  --class_prompt="a photo of cat" \
  --resolution=512 \
  --train_batch_size=1 \
  --gradient_accumulation_steps=1 \
  --learning_rate=5e-6 \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --num_class_images=200 \
  --max_train_steps=800

and here is the output of my accelerate config file:

base_job_name: accelerate-sagemaker-1
compute_environment: AMAZON_SAGEMAKER
distributed_type: 'NO'
ec2_instance_type: ml.p3.2xlarge
iam_role_name: accelerate_sagemaker_execution_role
image_uri: null
mixed_precision: FP16
num_machines: 1
profile: null
py_version: py38
pytorch_version: 1.10.2
region: us-east-1
sagemaker_inputs_file: null
sagemaker_metrics_file: null
transformers_version: 4.17.0
use_cpu: false

Any clues on what I could be doing wrong?

To zoom out, what I am trying to achieve is fine-tune Dreambooth on Sagemaker and save the resulting artifacts on S3.

Thank you for your time and assistance!

muellerzr · October 9, 2022, 4:14pm

Whats your version of accelerate?

jadechip · October 9, 2022, 4:41pm

Version: 0.14.0.dev0

muellerzr · October 9, 2022, 9:25pm

Thanks @jadechip, can you try using accelerate installed via:

pip install git+https://github.com/huggingface/accelerate@fix-config

please? Thank you!

jadechip · October 11, 2022, 5:50am

@muellerzr Thank you so much for your help with this! The version you mention, and also version accelerate==0.12.0 both worked, so there might be something wrong with the 0.14.0.dev0 build.

muellerzr · October 12, 2022, 11:43pm

This has now been solved on main thanks for your patience!

Topic		Replies	Views
ValueError (unknown key enable_cpu_affinity) on SageMaker for Accelerate >=0.29.0 🤗Accelerate	3	1764	May 22, 2024
OutOfMemoryError: CUDA out of memory while trying to replicate this notebook on sagemaker: https://github.com/huggingface/notebooks/blob/main/sagemaker/24_train_bloom_peft_lora/sagemaker-notebook.ipynb Amazon SageMaker	4	1686	June 16, 2023
Accelerate sees only one GPU on multi-GPU Sagemaker instance 🤗Accelerate	1	1529	May 2, 2023
Accelerate not performing distributed training 🤗Accelerate	2	571	October 5, 2023
Distributed Training on Sagemaker Amazon SageMaker	13	2725	August 5, 2021

SageMakerConfig object has no attribute gpu_ids

Related topics