Docker Space, connection problems

Spaces…
Public:

Private:

The problems described in these posts happen on them all. A through C have identical contents, W has small changes in UI config. The public spaces is a pared down version of the others but the scripts are basically the same.

So after the issue several people had 24hrs ago with build errors, such as
Build error without log I have found spaces to be very unstable ever since. I keep getting connection errors, over and over again, and now, a script which has worked fine for weeks, to pull in models from civitai, keeps throwing 403 errors.

---------------
Running script './on_start.sh' to download models ...
---------------
$ download-model --lora "BulkedUpAIR1.5.safetensors" "https://civitai.com/api/download/models/33323"


06/21 07:17:07 [ERROR] CUID#7 - Download aborted. URI=https://civitai.com/api/download/models/33323
Exception: [AbstractCommand.cc:351] errorCode=22 URI=https://civitai.com/api/download/models/33323
  -> [HttpSkipResponseCommand.cc:239] errorCode=22 The response status is not successful. status=403

The space is https://huggingface.co/spaces/MiroCollas/MC_WebUI_Simple_A
which was set up as Docker Blank. it uses aria2 to download.

A 403 is, I believe, β€œforbidden”, by by whom? Civitai or HF?

If there are changes which affect spaces, that users need to know about so we can adapt and edit scripts, where are these changes posted?

Sod’s law… I post the above, and 5mins later, it starts working again. But for how long?

LOL I knew it was too good to be true. First this…

So I rebooted … then I get this:

Cloning Stable Diffusion into /app/stable-diffusion-webui/repositories/stable-diffusion-stability-ai...
Cloning Taming Transformers into /app/stable-diffusion-webui/repositories/taming-transformers...
Cloning K-diffusion into /app/stable-diffusion-webui/repositories/k-diffusion...
Cloning CodeFormer into /app/stable-diffusion-webui/repositories/CodeFormer...
Traceback (most recent call last):
  File "/app/stable-diffusion-webui/launch.py", line 38, in <module>
    main()
  File "/app/stable-diffusion-webui/launch.py", line 29, in main
    prepare_environment()
  File "/app/stable-diffusion-webui/modules/launch_utils.py", line 291, in prepare_environment
    git_clone(codeformer_repo, repo_dir('CodeFormer'), "CodeFormer", codeformer_commit_hash)
  File "/app/stable-diffusion-webui/modules/launch_utils.py", line 147, in git_clone
    run(f'"{git}" clone "{url}" "{dir}"', f"Cloning {name} into {dir}...", f"Couldn't clone {name}")
  File "/app/stable-diffusion-webui/modules/launch_utils.py", line 101, in run
    raise RuntimeError("\n".join(error_bits))
RuntimeError: Couldn't clone CodeFormer.
Command: "git" clone "https://github.com/sczhou/CodeFormer.git" "/app/stable-diffusion-webui/repositories/CodeFormer"
Error code: 128
stderr: Cloning into '/app/stable-diffusion-webui/repositories/CodeFormer'...
fatal: unable to access 'https://github.com/sczhou/CodeFormer.git/': Could not resolve host: github.com


--> ERROR: process "/bin/sh -c /opt/venv/bin/python launch.py --exit --skip-torch-cuda-test --xformers" did not complete successfully: exit code: 1

I give up for tonight.

I was able to successfully reboot the space a couple hours ago, but now we’re back to this:

---------------
Running script './on_start.sh' to download models ...
---------------
$ download-model --lora "BulkedUpAIR1.5.safetensors" "https://civitai.com/api/download/models/33323"


06/21 19:43:45 [ERROR] CUID#7 - Download aborted. URI=https://civitai.com/api/download/models/33323
Exception: [AbstractCommand.cc:351] errorCode=22 URI=https://civitai.com/api/download/models/33323
  -> [HttpSkipResponseCommand.cc:239] errorCode=22 The response status is not successful. status=403

Download Results:
gid   |stat|avg speed  |path/URI
======+====+===========+=======================================================
126c3b|ERR |       0B/s|/app/stable-diffusion-webui/models/Lora/BulkedUpAIR1.5.safetensors

Status Legend:
(ERR):error occurred.

I restarted the space again, and it worked, so it is intermittent, whatever the problem is.

However on another space, SD WebUI Plus Basics - a Hugging Face Space by MiroCollas which is public,

06/21 19:50:40 [ERROR] CUID#7 - Download aborted. URI=https://civitai.com/api/download/models/76712
Exception: [AbstractCommand.cc:351] errorCode=22 URI=https://civitai.com/api/download/models/76712
  -> [HttpSkipResponseCommand.cc:239] errorCode=22 The response status is not successful. status=500

Download Results:
gid   |stat|avg speed  |path/URI
======+====+===========+=======================================================
ff83c3|ERR |       0B/s|/app/stable-diffusion-webui/embeddings/FastNegativeEmbedding.pt

Status Legend:
(ERR):error occurred.

aria2 will resume download if the transfer is restarted.
If there are any errors, then see the log file. See '-l' option in help/man page for details.
Traceback (most recent call last):
  File "/app/stable-diffusion-webui/run.py", line 35, in <module>
    start()
  File "/app/stable-diffusion-webui/run.py", line 16, in start
    on_start()
  File "/app/stable-diffusion-webui/run.py", line 12, in on_start
    raise RuntimeError(f"Error executing ./on_start.sh [exit code: {result.returncode}]")
RuntimeError: Error executing ./on_start.sh [exit code: 22]

[sigh]

New errors, and as far as i can tell, they are all connection-related, in one way or another.

--> RUN poetry install
Installing dependencies from lock file

Package operations: 15 installs, 0 updates, 0 removals

  β€’ Installing certifi (2022.12.7)
  β€’ Installing charset-normalizer (3.0.1)
  β€’ Installing idna (3.4)
  β€’ Installing typing-extensions (4.5.0)
  β€’ Installing urllib3 (1.26.14)
  β€’ Installing cmake (3.25.2)
  β€’ Installing filelock (3.9.0)
  β€’ Installing lit (15.0.7)
  β€’ Installing numpy (1.24.2)
  β€’ Installing pillow (9.4.0)
  β€’ Installing requests (2.28.2)
  β€’ Installing torch (1.13.1+cu117 https://download.pytorch.org/whl/cu117/torch-1.13.1%2Bcu117-cp310-cp310-linux_x86_64.whl)
  β€’ Installing numexpr (2.8.4)
  β€’ Installing torchvision (0.14.1+cu117 https://download.pytorch.org/whl/cu117/torchvision-0.14.1%2Bcu117-cp310-cp310-linux_x86_64.whl)
  β€’ Installing triton (2.0.0)

  RuntimeError

  Hash for triton (2.0.0) from archive triton-2.0.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl not found in known hashes (was: sha256:e95ac47e08f205714bcae4f0f2b2acd9157f6ec33fa2598080cf4bbf37e5fcc3)

  at /usr/local/poetry/venv/lib/python3.10/site-packages/poetry/installation/executor.py:818 in _validate_archive_hash
      814β”‚         archive_hash: str = "sha256:" + get_file_hash(archive)
      815β”‚         known_hashes = {f["hash"] for f in package.files if f["file"] == archive.name}
      816β”‚ 
      817β”‚         if archive_hash not in known_hashes:
    β†’ 818β”‚             raise RuntimeError(
      819β”‚                 f"Hash for {package} from archive {archive.name} not found in"
      820β”‚                 f" known hashes (was: {archive_hash})"
      821β”‚             )
      822β”‚ 


--> ERROR: process "/bin/sh -c poetry install" did not complete successfully: exit code: 1

I decided to delete space _A (see top post, and recreate it by duplicating _B (which was running at the time). New errors…

Python 3.10.6 (main, May 29 2023, 11:10:38) [GCC 11.3.0]
Version: v1.3.2
Commit hash: baf6946e06249c5af9851c60171692c44ef633e0
Installing gfpgan
Traceback (most recent call last):
  File "/app/stable-diffusion-webui/launch.py", line 38, in <module>
    main()
  File "/app/stable-diffusion-webui/launch.py", line 29, in main
    prepare_environment()
  File "/app/stable-diffusion-webui/modules/launch_utils.py", line 263, in prepare_environment
    run_pip(f"install {gfpgan_package}", "gfpgan")
  File "/app/stable-diffusion-webui/modules/launch_utils.py", line 124, in run_pip
    return run(f'"{python}" -m pip {command} --prefer-binary{index_url_line}', desc=f"Installing {desc}", errdesc=f"Couldn't install {desc}", live=live)
  File "/app/stable-diffusion-webui/modules/launch_utils.py", line 101, in run
    raise RuntimeError("\n".join(error_bits))
RuntimeError: Couldn't install gfpgan.
Command: "/opt/venv/bin/python" -m pip install https://github.com/TencentARC/GFPGAN/archive/8d2447a2d918f8eba5a4a01463fd48e45126a379.zip --prefer-binary
Error code: 1
stdout: Collecting https://github.com/TencentARC/GFPGAN/archive/8d2447a2d918f8eba5a4a01463fd48e45126a379.zip
  Downloading https://github.com/TencentARC/GFPGAN/archive/8d2447a2d918f8eba5a4a01463fd48e45126a379.zip (6.0 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.0/6.0 MB 61.9 MB/s eta 0:00:00
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
  Installing backend dependencies: started
  Installing backend dependencies: finished with status 'done'
  Preparing metadata (pyproject.toml): started
  Preparing metadata (pyproject.toml): finished with status 'done'
Collecting basicsr>=1.4.2 (from gfpgan==1.3.5)
  Downloading basicsr-1.4.2.tar.gz (172 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 172.5/172.5 kB 6.6 MB/s eta 0:00:00
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
  Installing backend dependencies: started
  Installing backend dependencies: finished with status 'error'

stderr:   error: subprocess-exited-with-error
  
  Γ— pip subprocess to install backend dependencies did not run successfully.
  β”‚ exit code: 1
  ╰─> [11 lines of output]
      Collecting numpy
        Downloading numpy-1.25.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (17.6 MB)
           ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 17.6/17.6 MB 171.8 MB/s eta 0:00:00
      Collecting torch
        Downloading torch-2.0.1-cp310-cp310-manylinux1_x86_64.whl (619.9 MB)
           β•Έ                                      10.5/619.9 MB 144.6 MB/s eta 0:00:05
      ERROR: THESE PACKAGES DO NOT MATCH THE HASHES FROM THE REQUIREMENTS FILE. If you have updated the package versions, please update the hashes. Otherwise, examine the package contents carefully; someone may have tampered with them.
          torch from https://files.pythonhosted.org/packages/8c/4d/17e07377c9c3d1a0c4eb3fde1c7c16b5a0ce6133ddbabc08ceef6b7f2645/torch-2.0.1-cp310-cp310-manylinux1_x86_64.whl:
              Expected sha256 8ced00b3ba471856b993822508f77c98f48a458623596a4c43136158781e306a
                   Got        307e5752dbdc4dbb85d7e07196521111ece7ebee6247d9c574b5c3a4c9c6a970
      
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

Γ— pip subprocess to install backend dependencies did not run successfully.
β”‚ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.```

Things have improved in the last few hours, but they’re not yet β€œgood”. I still get β€œconnection lost” errors while generating images. Refreshing the page generally gives me a β€œpreparing space” page so I need to restart the space. And that has to be done several times, as I get download errors of the sort already posted above, until eventually it all fetches and I can restart the generation - from scratch.

So it turns out, part of my problems were civitai barfing on download requests. However, that leaves the problem of the space often giving a β€œconnection lost” error, totally wiping out all work in progress.

Anyway, thanks for the suggestions and feedb…

[looks round at the empty room, hearing the occasional cricket outside]

Oh wait, never mind. :expressionless:

hi @anon22701748 I’m glad you’ve figure out the issue. Maybe send civitai feeback to return HTTP 429 Too Many Requests repose code. That would be very helpful in the future.

Good idea, thank you.
But that is only part of the problem. As noted previously, I often get β€œconnection lost” errors in the space, which wipes out the work in progress.

[Added:] Might I also suggest something like this

so if there are general system issues, users can be made aware? It might cut back on some of the forum posts. :slight_smile:

yes we have a status page https://status.huggingface.co/

Oh! Thanks! I have looked for one and not found it. Bookmarked. :slight_smile:

1 Like