Training a language model from scratch with tensorflow (not pytorch)?

olaffson · August 5, 2021, 4:06pm

Hello there,

I am interested in training a language model from scratch. Not fine tuning the usual distilbert ---- running the whole thing on my GPU instead !

I found this interesting notebook How to train a new language model from scratch using Transformers and Tokenizers and I would be interested to know if there is one that uses tensorflow instead. I cannot have pytorch on my machine unfortunately.

Is there a huggingface example notebook that would help me do that?

Thanks!

christopher · August 5, 2021, 7:05pm

Hi there!

This might be what you’re after: transformers/examples/tensorflow/language-modeling at master · huggingface/transformers · GitHub

You’d use the run_clm script for a GPT-2 like model, and the run_mlm script for a BERT-like model.

EDIT: If you’re able to use docker on your machine, you could also use a huggingface image to run that notebook you linked to.

olaffson · August 5, 2021, 7:52pm

thanks @christopher aka aznavour

but these notebooks are much, much more complex than the code on the notebook I liked (or the blog post). Looking for something more streamlined…

olaffson · August 6, 2021, 12:02pm

@christopher conceptually, I wonder if training my own language model and then fine-tune it for text-classification will work better than fine-tuning the same old distilbert model that everybody is using. The corpus I am working on is highly specialized (say, medicine for instance) so a dedicated language model makes sense.

What do you think?

christopher · August 9, 2021, 8:55am

I think it would depend on how much (and how different) specialized data you have (perhaps compare that to the size of the dataset the model was initially pretrained on). If it’s a considerable amount, it might make sense to continue pre-training from the checkpoint of the model you’re interested in.

Topic		Replies	Views
Doing classification 100% from scratch? 🤗Transformers	4	1756	September 17, 2021
Further pre-train language model in transformers like BERT Models	3	1123	March 27, 2022
Difference between language modeling scripts Models	1	486	December 20, 2021
Saving underlying language model after trained on downstream task 🤗Transformers	0	426	September 14, 2020
Training of new ELECTRA or ConvBERT language model possible? 🤗Transformers	0	264	May 3, 2021

Training a language model from scratch with tensorflow (not pytorch)?

Related topics