Can we access attention component and feed-forward component of a Bert layer?

I want to access the attention component and feed-forward component for an experiment.

I initialized a bert model using bert-base-uncased.
After that, I was trying to access the attention component using the model.encoder.layer, but I am not getting what I want.

On model.encoder.layer.modules, I am getting:

Output exceeds the size limit. Open the full output data in a text editor
<bound method Module.modules of ModuleList(
  (0): BertLayer(
    (attention): BertAttention(
      (self): BertSelfAttention(
        (query): Linear(in_features=768, out_features=768, bias=True)
        (key): Linear(in_features=768, out_features=768, bias=True)
        (value): Linear(in_features=768, out_features=768, bias=True)
        (dropout): Dropout(p=0.1, inplace=False)
      )
      (output): BertSelfOutput(
        (dense): Linear(in_features=768, out_features=768, bias=True)
        (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
        (dropout): Dropout(p=0.1, inplace=False)
      )
    )
    (intermediate): BertIntermediate(
      (dense): Linear(in_features=768, out_features=3072, bias=True)
      (intermediate_act_fn): GELUActivation()
    )
    (output): BertOutput(
      (dense): Linear(in_features=3072, out_features=768, bias=True)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
  )
...
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
  )
)>

My question is, Can we access the components (attention or feed-forward) of the Bert layer using Hugging-Face API?

1 Like

Hi, have you found a solution?

yes, I have found a solution

you can access the components.

Just look through the implementation of bert (https://github.com/huggingface/transformers/blob/main/src/transformers/models/bert/modeling_bert.py).

Here, you can access the embedding module as model.bert.embeddings(input_ids=input_ids, token_type_ids=token_type_ids) ,
attention module as model.bert.encoder.layer[i].attention.self_attention
,
Intermediate part of feed-forward as model.bert.encoder.layer[i].intermediate
,
and output part of feed-forward as model.bert.encoder.layer[i].bert_output.dense.

Here, i refers to the i-th transformer layer of the Bert model.