Staring up T4 instances is taking 45+ minutes

KyleNotTaken · February 7, 2025, 7:15pm

Last year, I was using T4 (small and medium) instances on Hugging Face to fine-tune some datasets. Back then, I remember it only taking maybe 10 minutes to start up a T4 instance.

Today, I tried doing the same in my space. I tried to start both a T4 Medium and T4 Small instance. But now it has been taking nearly an hour for either of them to start up.

Or, after 30 minutes I’ll get this error:

runtime error
Scheduling failure: unable to schedule

I haven’t seen anything related on the status page today:

Is Hugging Face just resource constrained right now due to all of the recent Deepseek and other releases this past week? Or is there something else going on?

Is there a page that I can check to know the current load on Hugging Face?

Thanks!

FantasyInc · February 7, 2025, 7:45pm

This is occurring for me on A100 instances also so it’s not limited to specific machines.

I’m receiving the same Scheduling failure: unable to schedule failure code after my endpoints are stuck in Initializing state for quite a few minutes. This behavior is the same on multiple endpoints today.

Continuing to retry but hoping to hear some guidance or will need to shop around.

FantasyInc · February 7, 2025, 8:16pm

In the us-east-1 N. Virginia region by the way

meganariley · February 7, 2025, 10:58pm

@FantasyInc @KyleNotTaken Thanks for reporting! This should be fixed now, but please let us know if you continue running into any issues.

FantasyInc · February 7, 2025, 11:14pm

@meganariley I’m retrying and will confirm. Note it’s still initializing for a much longer time than I’ve experienced before.

FantasyInc · February 7, 2025, 11:37pm

@meganariley It’s failed again for me after retrying and a long time initializing.

John6666 · February 8, 2025, 5:16am

Looking at the symptoms, it seems that this case and the series of problems below are connected. Well, if any of them are fixed, the rest will be fixed too.

FantasyInc · February 10, 2025, 1:59pm

I’m confirming that my AWS east-us-1 endpoint is now running. I was able to kick it off this morning, and it initialized in the more typical timeframe I was accustomed to.

If anyone had similar issues and was on AWS machines it may have been related to the same issue with provisioning those.

Topic		Replies	Views
Hugging Face - runtime error Spaces	4	2568	April 8, 2023
Building stuck on «Scheduling space» [since last Sunday] Beginners	3	963	January 27, 2023
Scheduling Failure: Not Enough Hardware Capacity Error Beginners	4	2320	June 30, 2025
Space stuck at Build Queued Spaces	2	1044	February 6, 2023
500 Internal error + 504 Gateway timed out Spaces	2	195	September 23, 2024

Staring up T4 instances is taking 45+ minutes

Related topics