What is the input vector size for a BERT and Transformer-XL?

gibbo0789 · September 2, 2020, 3:00pm

Does anyone know what size vectors the BERT and Transformer-XL models take and output?

For example, I know that bert-large is 24-layer, 1024-hidden, 16-heads per block, 340M parameters. (bert-base is 12 heads per block) does that mean it takes a vector size of [24,1024,16]? or am I miss understanding?

Any help is much appreciated

sgugger · September 2, 2020, 3:38pm

No the inputs are usually a tensor of ints of size batch_size x sequence_length (with integers between 0 and the vocabulary size of the model -1). You can find all expected shapes and output shapes in the documentation, for instance here for BERT.

Topic		Replies	Views
How to see BERT,BART... output dimensions? Beginners	2	6019	June 4, 2021
Sizes of Query, key and value vector in Bert Model 🤗Transformers	3	6023	March 25, 2021
BERT: What is the shape of each Transformer Encoder block in the final hidden state? Intermediate	7	13037	March 16, 2022
How to use transformer attention model when the input is features Beginners	1	1253	October 12, 2020
Feed output from one transformer model as input to another 🤗Transformers	1	1112	July 30, 2021

What is the input vector size for a BERT and Transformer-XL?

Related topics