BERT: What is the shape of each Transformer Encoder block in the final hidden state?

LegendaryFour · March 16, 2022, 9:18am

In the Transformer paper (Vaswani et al), the output dimension of encoder is d_model = 512. Is the hidden size in BERT (denoted as H in the BERT paper) actually the d_model in Transformer? But in BERT-base, this number changes from 512 to 768?

Topic		Replies	Views
Is last_hidden_state the output of Encoder block? Beginners	1	449	December 23, 2021
About the Cross-attention Layer Shape in Encoder-Decoder Model 🤗Transformers	1	1916	March 18, 2022
Why BertForMaskedLM has decoder layer 🤗Transformers	2	822	August 17, 2021
Regarding outputs in Encoder Beginners	0	229	April 10, 2022
How to see BERT,BART... output dimensions? Beginners	2	6005	June 4, 2021

BERT: What is the shape of each Transformer Encoder block in the final hidden state?

Related topics