hey @dk-crazydiv thanks for your specific questions!
as far as i know, “no limits” currently applies to your list as long as it doesn’t conflict with our current terms of service: Terms Of Service – Hugging Face
under the “termination” section you can find information which relates to your question about “right to ban”.
hth and i suggest getting in touch with the legal team if something remains unclear as they’re much better placed to answer these questions
Having issues with trainer.push_to_hub(repo_url=’…’) from colab. I logged in using the notebook_login and installed git lfs with no issues but when pushing the trainer to an existing model I trained last week it keeps loading and stuck on 'several commits (4) will be pushed upstream, the progress bard maybe be unreliable.". Has been almost an 30mins of loading.
Hey @nickmuchi thanks for raising this issue! Internally we’ve also experienced slow uploads from Colab to the Hub and are currently looking into it.
If you’re able to create a minimal reproducible example as a Colab notebook, it would be really helpful to share that as a GitHub issue on the transformersrepo
I found what the issue was, my output_dir where my checkpoints where being saved was a folder in the colab drive. When i moved that outside of my_drive it worked with no issues. Maybe something to highlight in the course given the benefits of saving checkpoints in the drive in case gpu runtime is disconnected.
Just for my understanding, your output_dir was a folder within the Colab instance itself, i.e. the default behaviour that one gets by opening a fresh Colab instance? Or are you saying that the problem arises when you mount an external drive to the Colab instance?
Yes problem happens when my output_dir is in my mounted drive (/content/drive/my_drive/output_dir) but if it is just in the colab instance it works fine (output_dir). So when i was training and saving checkpoints it was not pushing the model to hub after each epoch and after I had finished training i tried pushing trainer to hub with no success and only worked when i copied the model/tokenizer files from the mounted drive to colab instance.
Quick question: if I use the Trainer API’s push_to_hub method, does this only push the model and not the tokenizer? So should the default procedure be:
initialize tokenizer → tokenizer.push_to_hub()
train model → trainer.push_to_hub()
or am I missing something? When I pushed the trainer, it started complaining that the tokenizer was not present in the repo.