Is it possible to modify the forward behavior of a pre-trained model

bowenz · June 19, 2022, 6:19am

Hi, I’m currently using the mBART-50 model from huggingface via model = MBartForConditionalGeneration.from_pretrained(args.model_path) which is an encoder-decoder structural model.
In my use case, I also want to get the output tensor of the encoder apart from that of the decoder (calling the forward can only return me the final output).
So I’m wondering whether there is some way to do this with the huggingface model or if I have to define a new model myself.
Thanks for any suggestions and advice in advance!

nielsr · June 19, 2022, 1:27pm

Hi,

That’s possible by simply specifying output_hidden_states=True to the forward method (as seen here). This will ensure that the output dictionary contains a key called encoder_last_hidden_state.

bowenz · June 19, 2022, 1:56pm

Hi, thank you a lot for pointing that out! It would do fine for my use case. But will this increase a lot GPU memory usage? I think this needs to store outputs of all hidden layers, while what I need is only the output of the last layer.
-------------------update-------------------
I just found today that the model also returns the output of the encoder by default. By setting output_hidden_states=True it returns two extra tuples, i.e., all hidden states of the encoder and of the decoder. So this was a dumb question, I should have read the documentation and source code more carefully.

Topic		Replies	Views
For tuning a classifier head on a pretrained BERT should I use `last_hidden_state` or `outputs[0][:, 0, :]` from the BERT? Beginners	0	178	February 15, 2024
Only output tensor from huggingface model Beginners	2	525	September 14, 2022
Using huggingface generate() with custom model 🤗Transformers	1	2063	August 26, 2023
Call ViTMAE Forward Embedding Models	1	296	March 30, 2023
Resources for using custom models with trainer Beginners	6	5381	April 6, 2021

Is it possible to modify the forward behavior of a pre-trained model

Related topics