I’m running transformers in a Databricks notebook with a local dataset. the task is text classification with BERT and DistilBERT. I have no problem loading public checkpoints from the hub and fine-tuning. The problem comes when I want to push the model back to my account on the hub. The cell
from huggingface_hub import notebook_login
notebook_login()
doesn’t allow me to log in. Figuring I need to display HTML I tried
displayHTML(notebook_login())
but this gives me a java NullPointerException. Has anybody succeeded in connecting to their account from a DataBricks notebook?
thanks
Hi Alun!
Today I ran into the same challenge. Hugging Face unfortunately seems to lack proper documentation on how to login into the Hugging Face Hub from within a Databricks notebook. While I was unsuccessful in logging in through the currently documented huggingface-cli login
, notebook_login()
and HfApi.set_access_token()
, I was successful in logging into and pushing models and datasets to the Hugging Face Hub through a hacky implementation of what is supposed to be or become a deprecated method.
Start with installing Hugging Face Hub through:
%pip install huggingface_hub
And installing git-lfs through:
%sh
# https://stackoverflow.com/questions/48734119/git-lfs-is-not-a-git-command-unclear
curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | sudo bash
sudo apt-get install git-lfs
Then, continue with logging in through:
from huggingface_hub.commands.user import _login
from huggingface_hub import HfApi
api = HfApi()
_login(hf_api = api, username = "USER", password = "PASS")
And:
%sh
git config --global credential.helper store
# https://github.com/huggingface/notebooks/blob/main/examples/language_modeling_from_scratch-tf.ipynb
git config --global user.email "EMAIL"
git config --global user.name "FULLNAME"
This worked for me and took quite some trial and error to figure out. Pushing models still behaves a bit weird due to some occasionally occuring git errors.
I hope this helps!
It seems _login()
has seen been updated. You can still follow the login procedure above for the most part, but you need to use a token instead of username/password combination with _login():
from huggingface_hub.commands.user import _login
from huggingface_hub import HfApi
api = HfApi()
_login(hf_api = api, token = "TOKEN")
Tokens can be generated in the Access Tokens page of your Hugging Face profile.
Thanks. This looks reasonable, but I haven’t been able to get it to work as yet. I removed huggingface_hub from Databricks and reinstalled it (from Pypi). Looking at the code on GitHub, I can see what you are doing, and token is a keyword argument (although hf_api isn’t).
_login(api, token="TOKEN")
However, I’m getting the same problem. I checked when Pypi was updated last, and it appears to be after the last change to github. Just to be clear, I am using my actual token there, not the string “TOKEN”. My guess is that Databricks didn’t actually upgrade the library.
I go through the code but It was not work for me, I think it is regarding new updates, but you can use directly _login from _login.py
from huggingface_hub._login import _login
_login(token='your token as string', add_to_git_credential=False)