I aim to
- Add an additional module to BERT architecture. In detail, each layer should cross product a similar but not identical vector, depending on the index of layer and the type of BERT model
- Load the BERT’s weight to this new BERT model
- Then use BERT directly or continue train BERT
I’m very confused how to do it. Since we usually only load the configuration and weights of model from huggingface directly. This might be done via
hidden_states of each layer), but now can the vector be different depending on different model types?