Connecting to hub account from Databricks

I’m running transformers in a Databricks notebook with a local dataset. the task is text classification with BERT and DistilBERT. I have no problem loading public checkpoints from the hub and fine-tuning. The problem comes when I want to push the model back to my account on the hub. The cell

from huggingface_hub import notebook_login
notebook_login()

doesn’t allow me to log in. Figuring I need to display HTML I tried

displayHTML(notebook_login())

but this gives me a java NullPointerException. Has anybody succeeded in connecting to their account from a DataBricks notebook?

thanks

Hi Alun!

Today I ran into the same challenge. Hugging Face unfortunately seems to lack proper documentation on how to login into the Hugging Face Hub from within a Databricks notebook. While I was unsuccessful in logging in through the currently documented huggingface-cli login, notebook_login() and HfApi.set_access_token(), I was successful in logging into and pushing models and datasets to the Hugging Face Hub through a hacky implementation of what is supposed to be or become a deprecated method.

Start with installing Hugging Face Hub through:
%pip install huggingface_hub

And installing git-lfs through:

%sh
# https://stackoverflow.com/questions/48734119/git-lfs-is-not-a-git-command-unclear
curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | sudo bash
sudo apt-get install git-lfs

Then, continue with logging in through:

from huggingface_hub.commands.user import _login
from huggingface_hub import HfApi

api = HfApi()
_login(hf_api = api, username = "USER", password = "PASS")

And:

%sh
git config --global credential.helper store
# https://github.com/huggingface/notebooks/blob/main/examples/language_modeling_from_scratch-tf.ipynb
git config --global user.email "EMAIL"
git config --global user.name "FULLNAME"

This worked for me and took quite some trial and error to figure out. Pushing models still behaves a bit weird due to some occasionally occuring git errors.

I hope this helps!

It seems _login() has seen been updated. You can still follow the login procedure above for the most part, but you need to use a token instead of username/password combination with _login():

from huggingface_hub.commands.user import _login
from huggingface_hub import HfApi

api = HfApi()
_login(hf_api = api, token = "TOKEN")

Tokens can be generated in the Access Tokens page of your Hugging Face profile.

Thanks. This looks reasonable, but I haven’t been able to get it to work as yet. I removed huggingface_hub from Databricks and reinstalled it (from Pypi). Looking at the code on GitHub, I can see what you are doing, and token is a keyword argument (although hf_api isn’t).
_login(api, token="TOKEN")
However, I’m getting the same problem. I checked when Pypi was updated last, and it appears to be after the last change to github. Just to be clear, I am using my actual token there, not the string “TOKEN”. My guess is that Databricks didn’t actually upgrade the library.

I go through the code but It was not work for me, I think it is regarding new updates, but you can use directly _login from _login.py

from huggingface_hub._login import _login
_login(token='your token as string', add_to_git_credential=False)