Can we access attention component and feed-forward component of a Bert layer?

neeraj1909 · September 5, 2022, 8:01am

I want to access the attention component and feed-forward component for an experiment.

I initialized a bert model using bert-base-uncased.
After that, I was trying to access the attention component using the model.encoder.layer, but I am not getting what I want.

On model.encoder.layer.modules, I am getting:

Output exceeds the size limit. Open the full output data in a text editor
<bound method Module.modules of ModuleList(
  (0): BertLayer(
    (attention): BertAttention(
      (self): BertSelfAttention(
        (query): Linear(in_features=768, out_features=768, bias=True)
        (key): Linear(in_features=768, out_features=768, bias=True)
        (value): Linear(in_features=768, out_features=768, bias=True)
        (dropout): Dropout(p=0.1, inplace=False)
      )
      (output): BertSelfOutput(
        (dense): Linear(in_features=768, out_features=768, bias=True)
        (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
        (dropout): Dropout(p=0.1, inplace=False)
      )
    )
    (intermediate): BertIntermediate(
      (dense): Linear(in_features=768, out_features=3072, bias=True)
      (intermediate_act_fn): GELUActivation()
    )
    (output): BertOutput(
      (dense): Linear(in_features=3072, out_features=768, bias=True)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
  )
...
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
  )
)>

My question is, Can we access the components (attention or feed-forward) of the Bert layer using Hugging-Face API?

katarinayuan · June 17, 2023, 10:50pm

Hi, have you found a solution?

neeraj1909 · September 23, 2024, 10:00am

yes, I have found a solution

you can access the components.

Just look through the implementation of bert (https://github.com/huggingface/transformers/blob/main/src/transformers/models/bert/modeling_bert.py).

Here, you can access the embedding module as model.bert.embeddings(input_ids=input_ids, token_type_ids=token_type_ids) ,
attention module as model.bert.encoder.layer[i].attention.self_attention
,
Intermediate part of feed-forward as model.bert.encoder.layer[i].intermediate
,
and output part of feed-forward as model.bert.encoder.layer[i].bert_output.dense.

Here, i refers to the i-th transformer layer of the Bert model.

Topic		Replies	Views
About the Cross-attention Layer Shape in Encoder-Decoder Model 🤗Transformers	1	1912	March 18, 2022
Bert attention mask question 🤗Transformers	4	1201	March 11, 2024
Access and modify attention weights at runtime Beginners	0	2140	August 25, 2021
How to use transformer attention model when the input is features Beginners	1	1235	October 12, 2020
How to extract the encoded data of feed & forward layer in TFbertModel Beginners	0	450	August 18, 2021

Can we access attention component and feed-forward component of a Bert layer?

Related topics