Bert Config: Num attention heads

Hi, I am using the BertConfig to create a encoder decoder model in the following way:

encoder = BertConfig()
decoder = BertConfig()
config = EncoderDecoderConfig.from_encoder_decoder_configs(encoder, decoder)
bert2bert = EncoderDecoderModel(config=config)
bert2bert.config.decoder.is_decoder = True
bert2bert.config.decoder.add_cross_attention = True

bert2bert.config.encoder.num_attention_heads = 12
print(bert2bert.encoder.num_parameters(only_trainable=True),bert2bert.encoder.config.num_attention_heads)

With default attention heads 12 the trainable parameters and attention heads at encoder are as below

86742528, 12

However when I try to change the number of attention heads to 4, the number of trainable parameters does not change while the value for number of attention heads changes (as below). Can anyone help me out?


bert2bert.config.decoder.add_cross_attention = True
bert2bert.config.encoder.num_attention_heads = 4
print(bert2bert.encoder.num_parameters(only_trainable=True), bert2bert.encoder.config.num_attention_heads)

(86742528, 4)

@valhalla . Could you may be help me out here ? If this is an expected behavior or not ?

The total number of attention parameters won’t change. When there are 12 attention heads, Q, K and V matrices are split into 12 smaller versions as “heads”. When you specify 4 attention heads, the same thing happens.