BERT model size (transformer block number)

rgwatwormhill · August 14, 2020, 6:30pm

Hello again,

I am fairly sure you are right, and you would have to train a model from scratch if you want to alter the layer size.

I believe you could increase the width of the model by using more attention heads in each block, or by changing the hidden size, or both. For example, bert-large is 24-layer, 1024-hidden, 16-heads per block, 340M parameters. (bert-base is 12 heads per block) .

I think the hidden size corresponds to the number of real numbers used to represent each token, so I think you would need to train a new embedding layer if you changed the hidden-layer size.

Topic		Replies	Views
Forcing BERT hidden dimension size 🤗Transformers	1	1160	December 19, 2023
How to use the output of first several layers as the input of the last few layers in Bert/DistillBert 🤗Transformers	0	823	June 10, 2023
Iterating through BERT layers Beginners	0	695	November 23, 2021
Distilbert customize model 🤗Transformers	0	216	July 24, 2022
How can you delete BERT Layers after Finetuning Intermediate	0	1499	April 30, 2021

BERT model size (transformer block number)

Related topics