How to add additional module to BERT architecture, then load the original weight and use it

I aim to

  1. Add an additional module to BERT architecture. In detail, each layer should cross product a similar but not identical vector, depending on the index of layer and the type of BERT model
  2. Load the BERT’s weight to this new BERT model
  3. Then use BERT directly or continue train BERT

I’m very confused how to do it. Since we usually only load the configuration and weights of model from huggingface directly. This might be done via (after hidden_states of each layer), but now can the vector be different depending on different model types?

In more detail, I’m working on both prajjwal1/bert-tiny (now) and bert-base-uncased (next step).