Attention weights transfer but different classes

I want to apply the weights of attention layers of legal-bert (BertSelfAttention class) to longformer local attention QKV (LongformerSelfAttention class):

for layer_id in range(12):
    for new_layer, original_layer in zip(longformer.encoder.layer, legal_bert.encoder.layer):
        new_layer.attention.self.key.weight =