Are albert-base-v1( and v2) pretrained enough?

hojin · October 25, 2021, 6:28pm

Hi all,

I have questions on albert-base-v1 and v2 models uploaded in the huggingface model hub. I’ve checked the MLM loss of albert base models using book corpus dataset and squad context data(which is basically similar to Wikipedia data) based on this example script from transformer repo, in order to validate initial model performance (without additional training). It appears that the average MLM losses are around 2.5(squad context) and 3.2(book corpus) in albert-base-v1, which is much worse than I expected given that those two datasets must have been used for the pertaining albert base model. The value was much worse in v2.

I would like to ask whether albert-base models are trained until the losses converge, or the model has been trained on one or two epochs. Also, it would be great if there is an actual training script and/or MLM loss history on albert pertaining models.

Thanks!
Hojin

BramVanroy · October 25, 2021, 6:31pm

The models on the hub are not trained by HuggingFace (unless explicitly mentioned), so the weights are the original implementation weights, ported/converted to the implementation by HF. Whether or not the model is trained “well enough” is a question for the original authors of the model.

hojin · October 25, 2021, 8:57pm

Thanks for the reply @BramVanroy! And thanks a lot for correcting me. I was wondering why the weights for sentence order prediction (SOP) classification (or next sentence prediction) weights are not ported/converted, while it is provided by the original implementation? Thanks!

BramVanroy · October 26, 2021, 8:19am

As far as I know the BERT NSP weights are present when you use BertForPreTraining or BertForNextSentencePrediction. But you are right that AlbertForPreTraining does not seem to load the SOP weights, and there is no specific SOP-only model definition. I am not sure why that is the case.

hojin · October 26, 2021, 12:24pm

Thanks, @BramVanroy.

Topic		Replies	Views
Cannot find pre-trained SOP head of ALBERT Beginners	0	275	October 22, 2020
Pretraining ALBERT Intermediate	2	1337	February 16, 2022
Pre-training BERT Models	1	381	May 21, 2024
Further pre-train language model in transformers like BERT Models	3	1108	March 27, 2022
Questions about my first code on fine-tuning BERT model for text-classification Beginners	0	1510	April 26, 2022

Are albert-base-v1( and v2) pretrained enough?

Related topics