Inference widget not loading model

The inference widget for text generation is stuck at model loading for a while and eventually stops throwing a “model time out” error.

This happens for all the models I trained using lora with unsloth and pushed to the hub merged to float16, like this one: bmi-labmedinfo/Igea-1B-Instruct-v0.1

Other info:

  • same issue for gated and ungated models
  • no issue working locally using AutoModelForCausalLM.from_pretrained() and then model.generate()
  • no issue with quantized versions in HF spaces
  • console returns ‘503 (Service Unavailable)’ after page loading, then ‘504 (Gateway Timeout)’
  • the problem persists since last week