The inference widget for text generation is stuck at model loading for a while and eventually stops throwing a “model time out” error.
This happens for all the models I trained using lora with unsloth and pushed to the hub merged to float16, like this one: bmi-labmedinfo/Igea-1B-Instruct-v0.1
Other info:
- same issue for gated and ungated models
- no issue working locally using AutoModelForCausalLM.from_pretrained() and then model.generate()
- no issue with quantized versions in HF spaces
- console returns ‘503 (Service Unavailable)’ after page loading, then ‘504 (Gateway Timeout)’
- the problem persists since last week