I was trying out the autotrain platform by fine tuning a model on a dataset (that was related but different to the one which was used for the fine tuning of the previous checkpoints) and I don’t understand why all models were stopped after hours of training regardless of the performance achieved in terms of metrics (with some models showing significant better results than others).
Do you have any idea on what might be going on? I have also tried to reach you guys at autonlp@huggingface.co with some more details about the id of the project but figured it was worth asking here as well in case anyone else was experiencing similar issues.
Thank you in advance for your time!
getting stopped after
Hi! Thanks for reporting and sorry for the wait. It can be expected for a Space to restart from time to time if not enough RAM and memory for the training. In cases like this, we recommend using a larger instance to help. Please let us know though if there’s any other questions! Thanks again