Using BERT embeddings as input for transformer architecture

canovich · June 23, 2022, 6:53pm

I will use BERT’s embedding weights(as discussed here) for embedding in embedding layers of the transformer model. But my question is: Doesn’t embeddings of bert already go through the whole encoding layer and got that matrix? Why I shouldn’t just remove-freeze the encoding layer and use bert embedding vectors as input for the decoding layer? And also I will use bert embeddings in the input of the decoding layer. Why should I not freeze attention layers in decoder layer too? Because embeddings of output text already have attention information? Thanks in advance.

Topic		Replies	Views
How to freeze some layers of BertModel Beginners	8	17536	August 25, 2022
Fine Tune BERT Models Beginners	5	16578	June 25, 2021
Decode embeddings of BERT hidden layers Beginners	0	986	January 30, 2023
Transformers, am i only using a Encoder for Binary Classification? Beginners	1	1631	January 4, 2021
How to use the output of first several layers as the input of the last few layers in Bert/DistillBert 🤗Transformers	0	820	June 10, 2023

Using BERT embeddings as input for transformer architecture

Related topics