Precise meaning of ```d_head``` and ```d_inner```

Johncwok · December 14, 2021, 10:38am

I am trying to train a transformerXL model from scratch, but I am struggling to understand the meaning of the d_head and d_inner in the config.
I understand d_head as being the dimension of the value vector after attetion has been applied, but I have no clue what d_inner should be.
The doc only states that:

d_inner (int, optional, defaults to 4096) — Inner dimension in FF

What does FF mean here?

nielsr · December 14, 2021, 5:18pm

It stands for FeedForward. d_inner is the dimensionality of the hidden layer of the feedforward neural network (FF, FFN, or also called MLP as it’s a multilayer perceptron) inside the layers of the Transformer-XL model.

Johncwok · December 15, 2021, 9:59am

I see. So just to be clear then: Does this mean that the very first attention layer of the transformer needs an input of dimension d_embed, and the second attention layer expects an input of dimensions d_inner?

Edit: re-reading the original “attention is all you need” paper, I realize that there are actually two feed-forward layers, and that d_inner is the dimension of the first one. The second one bring the input to the correct dimension of d_embed again.

Topic		Replies	Views
Data shape needed for training TransformerXL from scratch Beginners	2	331	January 12, 2021
What is LM head mean? Beginners	5	18698	September 26, 2023
How to use transformer attention model when the input is features Beginners	1	1235	October 12, 2020
Understanding BertLMPredictionHead 🤗Transformers	3	2283	February 15, 2021
Why is only the parameter `attention_mask` singular? 🤗Transformers	0	102	January 16, 2024

Precise meaning of ```d_head``` and ```d_inner```

Related topics