I am writing to express my frustration regarding a training job I launched for the model Gragroo/Solenai-v-0-4, which was supposed to take approximately 77 hours. However, it unexpectedly stopped after about 44 hours without any explanation.
I feel quite disappointed, especially after investing nearly 80 euros into this training. I have not received any information in my Hugging Face mailbox, and if I hadn’t checked the job’s progress myself, I would have been completely unaware of the issue.
I would appreciate your prompt attention to this matter and any insights you can provide about why the training stopped and how I can resolve this issue.
Thank you for your assistance.
1 Like
@meganariley @not-lain It’s a money related issue.
hey. if you have a very long running training and you set a sleep time which is low, the space might sleep before training ends if you dont constantly keep checking thw ui manually. which is what i think happened here.
email us with more details at autotrain @ hf.co and we can take a deeper look.
HI,
Sent a email at @ hf.co at the same time.
Best regards
yes. i saw it after responding. we will get back with more information shortly.
After checking internally, it seems like sleep-time kicked in and the space went into sleep mode as it had no http traffic. Users who have long running jobs are expected to keep appropriate sleep time of the training space so that the space does not sleep when a training is on-going. More information on sleep-time can be found here: Using GPU Spaces
Hope that answers your question.