I am trying to understand each of the heads and bodies of models. Therefire I use torchsummary
library to undersatnd it.
Each of the architectural details are shown below. These details are for the BERT model.
NOTE: I am trying to implement something new with language understood model, therefore I am not using available headers like q&a or mask fill etc.
I have the following questions regarding this.
1.Why ModuleList: 3-6
only have each architecture? Automodel
have ModuleList: 3-1 to 12. ModuleList: 3-6
means ModuleList: 3-1 to 6 ?
2.How to make custom architecture of BERT? Do I need to use Automodel
? Then how to drop some layers? Example: Automodel
have ModuleList: 3-1 to 12 but AutoModelForQuestionAnswering
does not have some ModuleList layers.
3.According to Transformer architecture BERT use encoder side. If I use BertEmbeddings: 2-1 and BertEncoder: 2-2 only with ModuleList: 3-6 layers, is it enough to understand the language? (Because the encoder part of Transformer architecture understands the language)
For AutoModel
======================================================================
Layer (type:depth-idx) Param #
======================================================================
├─BertEmbeddings: 1-1 --
| └─Embedding: 2-1 23,440,896
| └─Embedding: 2-2 393,216
| └─Embedding: 2-3 1,536
| └─LayerNorm: 2-4 1,536
| └─Dropout: 2-5 --
├─BertEncoder: 1-2 --
| └─ModuleList: 2-6 --
| | └─BertLayer: 3-1 7,087,872
| | └─BertLayer: 3-2 7,087,872
| | └─BertLayer: 3-3 7,087,872
| | └─BertLayer: 3-4 7,087,872
| | └─BertLayer: 3-5 7,087,872
| | └─BertLayer: 3-6 7,087,872
| | └─BertLayer: 3-7 7,087,872
| | └─BertLayer: 3-8 7,087,872
| | └─BertLayer: 3-9 7,087,872
| | └─BertLayer: 3-10 7,087,872
| | └─BertLayer: 3-11 7,087,872
| | └─BertLayer: 3-12 7,087,872
├─BertPooler: 1-3 --
| └─Linear: 2-7 590,592
| └─Tanh: 2-8 --
======================================================================
Total params: 109,482,240
Trainable params: 109,482,240
Non-trainable params: 0
======================================================================
For AutoModelForQuestionAnswering
=================================================================
Layer (type:depth-idx) Param #
=================================================================
├─BertModel: 1-1 --
| └─BertEmbeddings: 2-1 --
| | └─Embedding: 3-1 23,440,896
| | └─Embedding: 3-2 393,216
| | └─Embedding: 3-3 1,536
| | └─LayerNorm: 3-4 1,536
| | └─Dropout: 3-5 --
| └─BertEncoder: 2-2 --
| | └─ModuleList: 3-6 85,054,464
├─Linear: 1-2 1,538
=================================================================
Total params: 108,893,186
Trainable params: 108,893,186
Non-trainable params: 0
=================================================================
For AutoModelForMaskedLM
===========================================================================
Layer (type:depth-idx) Param #
===========================================================================
├─BertModel: 1-1 --
| └─BertEmbeddings: 2-1 --
| | └─Embedding: 3-1 23,440,896
| | └─Embedding: 3-2 393,216
| | └─Embedding: 3-3 1,536
| | └─LayerNorm: 3-4 1,536
| | └─Dropout: 3-5 --
| └─BertEncoder: 2-2 --
| | └─ModuleList: 3-6 85,054,464
├─BertOnlyMLMHead: 1-2 --
| └─BertLMPredictionHead: 2-3 --
| | └─BertPredictionHeadTransform: 3-7 592,128
| | └─Linear: 3-8 23,471,418
===========================================================================
Total params: 132,955,194
Trainable params: 132,955,194
Non-trainable params: 0
===========================================================================
For AutoModelForTokenClassification
=================================================================
Layer (type:depth-idx) Param #
=================================================================
├─BertModel: 1-1 --
| └─BertEmbeddings: 2-1 --
| | └─Embedding: 3-1 23,440,896
| | └─Embedding: 3-2 393,216
| | └─Embedding: 3-3 1,536
| | └─LayerNorm: 3-4 1,536
| | └─Dropout: 3-5 --
| └─BertEncoder: 2-2 --
| | └─ModuleList: 3-6 85,054,464
├─Dropout: 1-2 --
├─Linear: 1-3 1,538
=================================================================
Total params: 108,893,186
Trainable params: 108,893,186
Non-trainable params: 0
=================================================================
For AutoModelForMultipleChoice
=================================================================
Layer (type:depth-idx) Param #
=================================================================
├─BertModel: 1-1 --
| └─BertEmbeddings: 2-1 --
| | └─Embedding: 3-1 23,440,896
| | └─Embedding: 3-2 393,216
| | └─Embedding: 3-3 1,536
| | └─LayerNorm: 3-4 1,536
| | └─Dropout: 3-5 --
| └─BertEncoder: 2-2 --
| | └─ModuleList: 3-6 85,054,464
| └─BertPooler: 2-3 --
| | └─Linear: 3-7 590,592
| | └─Tanh: 3-8 --
├─Dropout: 1-2 --
├─Linear: 1-3 769
=================================================================
Total params: 109,483,009
Trainable params: 109,483,009
Non-trainable params: 0
=================================================================
For AutoModelForCausalLM
===========================================================================
Layer (type:depth-idx) Param #
===========================================================================
├─BertModel: 1-1 --
| └─BertEmbeddings: 2-1 --
| | └─Embedding: 3-1 23,440,896
| | └─Embedding: 3-2 393,216
| | └─Embedding: 3-3 1,536
| | └─LayerNorm: 3-4 1,536
| | └─Dropout: 3-5 --
| └─BertEncoder: 2-2 --
| | └─ModuleList: 3-6 85,054,464
├─BertOnlyMLMHead: 1-2 --
| └─BertLMPredictionHead: 2-3 --
| | └─BertPredictionHeadTransform: 3-7 592,128
| | └─Linear: 3-8 23,471,418
===========================================================================
Total params: 132,955,194
Trainable params: 132,955,194
Non-trainable params: 0
===========================================================================
For AutoModelForSequenceClassification
=================================================================
Layer (type:depth-idx) Param #
=================================================================
├─BertModel: 1-1 --
| └─BertEmbeddings: 2-1 --
| | └─Embedding: 3-1 23,440,896
| | └─Embedding: 3-2 393,216
| | └─Embedding: 3-3 1,536
| | └─LayerNorm: 3-4 1,536
| | └─Dropout: 3-5 --
| └─BertEncoder: 2-2 --
| | └─ModuleList: 3-6 85,054,464
| └─BertPooler: 2-3 --
| | └─Linear: 3-7 590,592
| | └─Tanh: 3-8 --
├─Dropout: 1-2 --
├─Linear: 1-3 1,538
=================================================================
Total params: 109,483,778
Trainable params: 109,483,778
Non-trainable params: 0
=================================================================
For AutoModelForMultipleChoice
=================================================================
Layer (type:depth-idx) Param #
=================================================================
├─BertModel: 1-1 --
| └─BertEmbeddings: 2-1 --
| | └─Embedding: 3-1 23,440,896
| | └─Embedding: 3-2 393,216
| | └─Embedding: 3-3 1,536
| | └─LayerNorm: 3-4 1,536
| | └─Dropout: 3-5 --
| └─BertEncoder: 2-2 --
| | └─ModuleList: 3-6 85,054,464
| └─BertPooler: 2-3 --
| | └─Linear: 3-7 590,592
| | └─Tanh: 3-8 --
├─Dropout: 1-2 --
├─Linear: 1-3 769
=================================================================
Total params: 109,483,009
Trainable params: 109,483,009
Non-trainable params: 0
=================================================================