Upload a TF model to Huggingface

Kamel · September 1, 2021, 11:10am

Hi,
I am pre-training a Bert model from scratch using Tensorflow.
I’ve seen the methid to push pyTorch models, but I don’t know how to do with my TF model.

Here is how I imagine I have to do:
1- Convert my checkpoint from Tf to torch
2- Push to HF

Is this correct?

But an other question is :
I don’t know how to push the tokenizer, all I am having is:

my vocab.txt
tokenizer.vocab
-tokenizer.model

how can I do this please.

Thanks

nielsr · September 1, 2021, 1:14pm

If you pre-trained BERT from scratch in TF using the run_mlm.py script, you can easily convert the model from TF to PyTorch, like so:

from transformers import BertForMaskedLM

model = BertForMaskedLM.from_pretrained("name of directory where the run_mlm.py script saved all files", from_tf=True)
model.save_pretrained("name of directory where you'd like to save all model files")

Next, you can easily push it to the hub as follows (I’m assuming you’re in a Colab notebook):

First, install git-LFS:

!sudo apt-get install git-lfs
!git config --global user.email "<your email>"
!git config --global user.name "<your name>"

Next, create a repo on the hub, then git clone it:

git clone <URL of your repository on the hub>

Next, add your files and upload them:

git add .
git commit -m "First commit"
git push

Kamel · September 1, 2021, 1:35pm

Thanks a lot.
Is this enough even for the tokenizer. I heard I have also provide the tokenizer with the model.

nielsr · September 1, 2021, 1:38pm

Yes you should also include the tokenizer files. As these are framework-independent, you can use the ones that were saved from the run_mlm.py script.

Kamel · September 1, 2021, 1:41pm

Thanks, one more question about it:
How can pytorch users use my tokenizer with AutoTokenizer?
Providing my vocab.txt is it enough or tokenizer.model is the one they need.

nielsr · September 1, 2021, 1:42pm

Not sure what tokenizer.model is, normally the vocab.txt is enough.

Kamel · September 1, 2021, 1:43pm

Thanks a lot for you complete answer Nielsr

Topic		Replies	Views
Push model to hugging face hub without Trainer Intermediate	7	1413	May 14, 2024
Convert tensorflow tokenclassifier checkpoint to pytorch 🤗Transformers	2	910	January 2, 2022
Availability of models pushed to Hub 🤗Hub	2	986	September 22, 2021
Issue with converting my own BERT TF2 checkpoint to PyTorch and loading the PyTorch model for training 🤗Transformers	0	537	February 25, 2021
Save custom transformer as PreTrainedModel Intermediate	1	931	September 7, 2021

Upload a TF model to Huggingface

Related topics