Is it possible to modify the forward behavior of a pre-trained model

Hi, I’m currently using the mBART-50 model from huggingface via model = MBartForConditionalGeneration.from_pretrained(args.model_path) which is an encoder-decoder structural model.
In my use case, I also want to get the output tensor of the encoder apart from that of the decoder (calling the forward can only return me the final output).
So I’m wondering whether there is some way to do this with the huggingface model or if I have to define a new model myself.
Thanks for any suggestions and advice in advance!


That’s possible by simply specifying output_hidden_states=True to the forward method (as seen here). This will ensure that the output dictionary contains a key called encoder_last_hidden_state.

Hi, thank you a lot for pointing that out! It would do fine for my use case. But will this increase a lot GPU memory usage? I think this needs to store outputs of all hidden layers, while what I need is only the output of the last layer.
I just found today that the model also returns the output of the encoder by default. By setting output_hidden_states=True it returns two extra tuples, i.e., all hidden states of the encoder and of the decoder. So this was a dumb question, I should have read the documentation and source code more carefully.