Spaces: Launch timed out, space was not healthy after 30 min

Hi,

currently I want to create a A100 Space with a finetuned Falcon-40b-model as a gradio app. Everything works so far until the automatic model download starts. The model consists of 18 shards hat take approximately 2,5 minutes per shard for downloading which leads to a build/start time of approx 40-45 minutes overall.

After exactly 30 minutes the Space breaks with the error message: “Launch timed out, space was not healthy after 30 min” which actually makes sense but is unwanted though.

What can I do to prevent this behaviour?

Kind regards
Julian

hi @JulianGerhard ,

Thank you for raising this issue. We currently have an ongoing internal discussion regarding adding a customization for the health check timer. In the meantime, we recommend using a custom Dockerfile from the Space SDK and downloading the model during the build time in the Docker layer. Please let me know if you need any assistance with this. Here is an example Dockerfile · radames/Falcon-40b-Dockerfile at main.

1 Like

Hello @radames . I also have a similar question. My space builds through a docker file and then it has to download a dataset from my another huggingface dataset repo which takes more than 30 minutes and same error occurs. Is there a workaround? I am new to huggingface…

Also when I restart the space, it successfully downloads at the dataset but then throws memory error (more than 16GB) which makes sense. But does that mean that I have to go for a paid version or is there a work around?

hi @ridasagheer ,

We have an undocumented parameter to customize startup duration timeout

startup_duration_timeout: 1h

If you’re using docker, you could eventually download and pack the dataset at the Docker build time, so you’re Space will boot up and have immediate access to the data.

yes the free CPU tier offers 16GB RAM, if you need more you’ll need to upgrade to CPU Upgrade which offers 32GB RAM

Hey i am facing the similar issue.
But now that space is using Docker sdk it throws the error for gradio mentioned in app.py.

ModuleNotFoundError: No module named ‘gradio’

could you please share more details about your Docker Space?