Chapter 4 questions

hey @dk-crazydiv thanks for your specific questions!

as far as i know, “no limits” currently applies to your list as long as it doesn’t conflict with our current terms of service: Terms Of Service – Hugging Face

under the “termination” section you can find information which relates to your question about “right to ban”.

hth and i suggest getting in touch with the legal team if something remains unclear as they’re much better placed to answer these questions :slight_smile:

1 Like

Hi there,

Having issues with trainer.push_to_hub(repo_url=’…’) from colab. I logged in using the notebook_login and installed git lfs with no issues but when pushing the trainer to an existing model I trained last week it keeps loading and stuck on 'several commits (4) will be pushed upstream, the progress bard maybe be unreliable.". Has been almost an 30mins of loading.

I basically followed the summarization part of the course: Main NLP tasks - Hugging Face Course

Hey @nickmuchi thanks for raising this issue! Internally we’ve also experienced slow uploads from Colab to the Hub and are currently looking into it.

If you’re able to create a minimal reproducible example as a Colab notebook, it would be really helpful to share that as a GitHub issue on the transformers repo :slight_smile:

I found what the issue was, my output_dir where my checkpoints where being saved was a folder in the colab drive. When i moved that outside of my_drive it worked with no issues. Maybe something to highlight in the course given the benefits of saving checkpoints in the drive in case gpu runtime is disconnected.

Thanks for your response.

1 Like

Thanks for the extra information @nickmuchi !

Just for my understanding, your output_dir was a folder within the Colab instance itself, i.e. the default behaviour that one gets by opening a fresh Colab instance? Or are you saying that the problem arises when you mount an external drive to the Colab instance?

Yes problem happens when my output_dir is in my mounted drive (/content/drive/my_drive/output_dir) but if it is just in the colab instance it works fine (output_dir). So when i was training and saving checkpoints it was not pushing the model to hub after each epoch and after I had finished training i tried pushing trainer to hub with no success and only worked when i copied the model/tokenizer files from the mounted drive to colab instance.

Great, thanks for the full context - I’ll add a note in the course so that others don’t run into the same issue!

1 Like

The upload_file approach section has path_or_fileobj missing in its parameters

Thanks for reporting @sandeshrajx ! Would you mind sharing the URL to the section this happens?

Quick question: if I use the Trainer API’s push_to_hub method, does this only push the model and not the tokenizer? So should the default procedure be:

  1. initialize tokenizer → tokenizer.push_to_hub()
  2. train model → trainer.push_to_hub()

or am I missing something? When I pushed the trainer, it started complaining that the tokenizer was not present in the repo.

Thanks in advance!

from huggingface_hub import create_repo

create_repo(“dummy-model”, organization=“PISC”)

TypeError: HfApi.create_repo() got an unexpected keyword argument ‘organization’

from huggingface_hub import upload_file

upload_file(
“<path_to_file>/config.json”,
path_in_repo=“config.json”,
repo_id=“/dummy-model”,
)

upload_file(
… path_or_fileobj=“.\local\file\path”,
… path_in_repo=“remote/file/path.h5”,
… repo_id=“username/my-model”,
… token=“my_token”,
… create_pr=True,
… )
https://huggingface.co/username/my-model/blob/refs/pr/1/remote/file/path.h5

path_or_fileobj (str, Path, bytes, or IO):
Path to a file on the local machine or binary data stream / fileobj / buffer.

How to realize it on colab?


I cann’t login hagging face with my Jupyter Notebooks in Visual Studio Code. It just shows connetcing…