Stable Diffusion Web UI - Scheduling Spaces - A10G Small

Hello,

I have a Stable Diffusion Web UI that I’m attempting to run on A10G - Small on Huggingface. Unfortunately, it seems to get stuck at “Scheduling Space” for around 30-60+ minutes every time I attempt to wake the interface up.

Is there something I’m doing wrong? I wish Colab could provide a persistent URL - that would be an alternate solution.

Thanks!

Here’s the Dockerfile: Dockerfile - Web UI · GitHub

Ok, it’s actually over 60 minutes most of the time on “scheduling spaces” on A10G - Small.

I am updated the Dockerfile to use @camenduru’s T4 Private version since that comes with 30gb of RAM over the 15 that are in the A10G Small and am trying again.

Does anyone know how to speed up the waking of a space?

Hello @iamrobotbear,

It seems the Docker image yielded by the Dockerfile in your Space iamrobotbear/webui-docker is too large to fit on A10G small hardware leading to this scheduling issues.

To solve this you should consider either slimming down the image by maybe removing some checkpoints or use a flavor with more storage available such as A10G large.

@chris-rannou

Appreciate the reply…

  • How do I know when a space is running low on storage? There’s no mention of it in the build or container logs?

  • If I am out of storage why would the space eventually start on either T4 Medium or A10G Small?

  • When I add up the size of each of my checkpoints I’m at 15.04gb - obviously there are all of the dependencies and the OS, but since there’s no interactive terminal how can I check?

Thanks!

This kind of error is currently not handled correctly which is why you don’t have clear information that it is what’s happening.

It “works” on T4 hosts because the underlying hosts have more total storage available.

Only way to check currently the size of your image would be to build it locally to get the size.

We’ll be working on handling this case to provide you with more information if it happens again.

@chris-rannou

If I wanted to build the exact copy of my space locally is there a doc you can point me to? Since I don’t have that hardware and it would be built locally on my MacBook Pro I’m not sure how to configure my Dockerfile?

Are you able to identify how large my total image is from your side since you have access to the backend?

How long should I expect for it to take to start my space up if I chop this down to a single checkpoint on A10G?

Lastly, @chris-rannou or @camenduru I’m trying to add authentication (either via basic user/pass) or even better yet, SSO (probably a pipe dream) to the space, is that possible via Sharing Your App or via adding secrets to the Spaces configuration since this is a Docker build?

Essentially I’d like to provide access to the UI without everyone having to have a Hugging Face account/be added to my organization and not permit the public to run up a huge bill on A10G Large if the space is public.

Thanks!

You should be able to build this Dockerfile without having the required hardware.

Currently your image is at about 100+ GB. This is mainly due to some optimization issue on the Dockerfile definition (chown at the end). I’ll soon come back to you with a suggested optimization.

Once optimized and if you reduce to a single checkpoints the startup should not be longer than up to 10 min (difficult to estimate beforehand). The main delay currently is because of the time required to download the image.

Any advice here on restricting access via password or SSO?

Thanks!

@chris-rannou

Hi @iamrobotbear :wave: let’s try I am very curious right now

Traceback (most recent call last):
  File "/content/stable-diffusion-webui/webui.py", line 12, in <module>
    from modules.call_queue import wrap_queued_call, queue_lock, wrap_gradio_gpu_call
  File "/content/stable-diffusion-webui/modules/call_queue.py", line 7, in <module>
    from modules import shared
  File "/content/stable-diffusion-webui/modules/shared.py", line 125, in <module>
    os.makedirs(cmd_opts.hypernetwork_dir, exist_ok=True)
  File "/usr/lib/python3.10/os.py", line 225, in makedirs
    mkdir(name, mode)
PermissionError: [Errno 13] Permission denied: '/content/stable-diffusion-webui/models/hypernetworks'

@chris-rannou how can we solve this without chown and chmod

1 Like

@chris-rannou please tell us where can we get the documentation chown cause 100+ image size

@iamrobotbear I opened a PR on your Space to suggest an optimization of your dockerfile.

@chris-rannou Do I need to merge it or will it automatically build?

Any guidance on how I can restrict access to the space so that it’s not publicly accessible? I want to be able to provide access to the URL or ideally iframe / webcomponent without having the space be public.

Thanks!

@chris-rannou The PR was merged and the build failed. Trying a factory reboot.

nope, build fails

===== Build Queued at 2023-02-13 17:48:24 / Commit SHA: 83c3ca0 =====

--> FROM docker.io/nvidia/cuda:11.7.1-cudnn8-devel-ubuntu22.04@sha256:a73dd27ec3a249cd09d6833d1c7c41b810ad12e4577d5b5d5d11375d438b6528
DONE 0.0s
DONE 41.4s
DONE 58.5s

--> RUN adduser --disabled-password --gecos '' user
Adding user `user' ...
Adding new group `user' (1000) ...
Adding new user `user' (1000) with group `user' ...
Creating home directory `/home/user' ...
Copying files from `/etc/skel' ...
DONE 28.4s

--> WORKDIR /content
DONE 0.1s

--> RUN sed -i 's http://deb.debian.org http://cdn-aws.deb.debian.org g' /etc/apt/sources.list && sed -i 's http://archive.ubuntu.com http://us-east-1.ec2.archive.ubuntu.com g' /etc/apt/sources.list && sed -i '/security/d' /etc/apt/sources.list && apt-get update -y && apt-get upgrade -y && apt-get install -y libgl1 libglib2.0-0 wget git git-lfs python3-pip python-is-python3 && pip3 install --upgrade pip
sed: couldn't open temporary file /etc/apt/sedITYtfY: Permission denied

--> ERROR: process "/bin/sh -c sed -i 's http://deb.debian.org http://cdn-aws.deb.debian.org g' /etc/apt/sources.list && sed -i 's http://archive.ubuntu.com http://us-east-1.ec2.archive.ubuntu.com g' /etc/apt/sources.list && sed -i '/security/d' /etc/apt/sources.list && apt-get update -y && apt-get upgrade -y && apt-get install -y libgl1 libglib2.0-0 wget git git-lfs python3-pip python-is-python3 && pip3 install --upgrade pip" did not complete successfully: exit code: 4

I want to clarify something.

@chris-rannou said

To solve this you should consider either slimming down the image by maybe removing some checkpoints or use a flavor with more storage available such as A10G large.

but,

from https://huggingface.co/spaces/iamrobotbear/webui-docker we have a command runner under the Hugging Face Tab which you can use to try it out yourself.

As shown in the screenshot below, we have 114G of empty space, which means we can use around 20 models if each is 5GB.

Screenshot 2023-02-14 035024

We can download the model after the Docker building process by going to the Run Command under the Hugging Face Tab and using the following command:

wget https://huggingface.co/ckpt/anything-v4.5-vae-swapped/resolve/main/anything-v4.5-vae-swapped.safetensors -O /content/stable-diffusion-webui/models/Stable-diffusion/anything-v4.5-vae-swapped.safetensors

I believe this solves the problem, but I am still working on optimizing the Docker

Also we are waiting Amazon EKS Distro v1.25 for gaining root access back like the good old days :partying_face: GitHub - aws/eks-distro at 1.25-1

this means we can use apt-get install or sudo inside running vm and we don’t need to optimize for chowns and chmod

1 Like

and /content only 4.2G

If you want a cooler :sunglasses: way to download models, we have the CivitAI Web UI Extension

you can add like this

RUN git clone -b v1.6 https://github.com/camenduru/sd-civitai-browser /content/stable-diffusion-webui/extensions/sd-civitai-browser

please create the space inside your organization and make it private so that only users within the organization can see it.

@Omnibus cool :fire: