I’m currently trying to use accelerate to run Dreambooth via Automatic1111’s webui using 4xRTX 3090.
Here’s my setup, what I’ve done so far, including the issues I’ve encountered so far and how I solved them:
OS: Ubuntu Mate 22.04
Environment Setup:
Using miniconda, created environment name: sd-dreambooth
cloned Auto1111’s repo, navigated to extensions, cloned dreambooth extension
running it with accelerate without modifications to ./webui.sh causes multiple instances of webui to be run. I needed to add:
--num_processes 1
to the accelerate launch args towards the end of the script.
For some reason, cudatoolkit didn’t install by running the script so I was getting an error related to:
"str2optimizer32bit"
fixed by running
conda install cudatoolkit
I noticed during launch that I was getting an error saying that triton wasn’t installed.
fixed with
pip install triton
I usually use fp16, but after installing triton I started getting an error related to:
slow_conv2d_cpu" not implemented for 'Half'
which after some research led me to believe that I just had to use no mixed precision, so I added
--mixed_precision no
to accelerate launch args as well. That solved that problem.
So currently, my accelerate launch is:
accelerate launch --multi_gpu --gpu_ids 0,1,2,3 --mixed_precision no --num_machines 1 --num_processes 1 --num_cpu_threads_per_process=1
So, at this point I get no errors by using the following advanced settings in Dreambooth:
8 Bit Adam = Yes
Mixed Precision = No
Memory Attention = Default
Don’t Cache Latents = False
Train Text Enconder = True
Train EMA = True
Shuffle After Epoch = False
Pad Tokens = True
Gradient Checkpointing = False
Web UI launches cleanly without errors (line 129 is just because I’m using conda env):
Install script for stable-diffusion + Web UI
Tested on Debian 11 (Bullseye)
################################################################
################################################################
Running on lukium user
################################################################
################################################################
Repo already cloned, using it as install directory
################################################################
################################################################
Create and activate python venv
################################################################
./webui.sh: line 129: source: -/: invalid option
source: usage: source filename [arguments]
################################################################
Accelerating launch.py...
################################################################
Python 3.10.6 (main, Oct 24 2022, 16:07:47) [GCC 11.2.0]
Commit hash: 828438b4a190759807f9054932cae3a8b880ddf1
Installing requirements for Web UI
Installing requirements for Dreambooth
Checking Dreambooth requirements.
Dreambooth revision is c589a3596ade64228de8a7851f50c2470c7a76aa
Args: ['extensions/sd_dreambooth_extension/install.py']
[*] Diffusers version is 0.7.2.
[*] Torch version is 1.12.1+cu116.
[*] Torch vision version is 0.13.1+cu116.
[*] Transformers version is 4.21.0.
[*] Xformers
Launching Web UI with arguments: --ckpt-dir ./checkpoints --disable-safe-unpickle --xformers
Patching transformers to fix kwargs errors.
Dreambooth API layer loaded
LatentDiffusion: Running in eps-prediction mode
DiffusionWrapper has 859.52 M params.
making attention of type 'vanilla' with 512 in_channels
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
making attention of type 'vanilla' with 512 in_channels
Loading weights [81761151] from /home/lukium/stable-diffusion/instances/sd-dreambooth/models/Stable-diffusion/sd-15/sd-v1-5.ckpt
Global Step: 840000
Using VAE found similar to selected model: /home/lukium/stable-diffusion/instances/sd-dreambooth/models/Stable-diffusion/sd-15/sd-v1-5.vae.pt
Loading VAE weights from: /home/lukium/stable-diffusion/instances/sd-dreambooth/models/Stable-diffusion/sd-15/sd-v1-5.vae.pt
Applying xformers cross attention optimization.
Model loaded.
Loaded a total of 0 textual inversion embeddings.
Embeddings:
Running on local URL: http://127.0.0.1:7860
Everything seems good, training works, but only 1 GPU (0) gets used still.
nvidia-smi shows that everything is good to go:
Sat Nov 26 12:00:27 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 520.56.06 Driver Version: 520.56.06 CUDA Version: 11.8 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:02:00.0 Off | N/A |
| 57% 41C P8 38W / 370W | 5MiB / 24576MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 NVIDIA GeForce ... Off | 00000000:06:00.0 Off | N/A |
| 0% 56C P8 30W / 350W | 5MiB / 24576MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 2 NVIDIA GeForce ... Off | 00000000:09:00.0 Off | N/A |
| 0% 52C P8 22W / 350W | 5MiB / 24576MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 3 NVIDIA GeForce ... Off | 00000000:0A:00.0 Off | N/A |
| 0% 52C P8 24W / 420W | 5MiB / 24576MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1706 G /usr/lib/xorg/Xorg 4MiB |
| 1 N/A N/A 1706 G /usr/lib/xorg/Xorg 4MiB |
| 2 N/A N/A 1706 G /usr/lib/xorg/Xorg 4MiB |
| 3 N/A N/A 1706 G /usr/lib/xorg/Xorg 4MiB |
+-----------------------------------------------------------------------------+
Any suggestions to get all 4 GPUs to work?