Whisper: Forward Hook on final_layer_norm vs out.encoder_hidden_states

carlimminent · September 25, 2023, 2:51pm

Hi everyone,
I’m working on WhisperModel.from_pretrained(“openai/whisper-base”) and I have a question that may not

I have extracted the hidden layer outputs via forward hook on the final_layer_norm of the encoder blocks and also from the built-in encoder_hidden_states (out.encoder_hidden_states) of the huggingface model. Comparing the 2 together I’m getting different outputs. Could anyone explain what makes the two different? Is the residual connection and layer normalization not applied to one of them?

Also, considering that the first vector of the out.encoder_hidden_states is the output of the positional embeddings if I’m correct.

Sample code:

hidden_states_hf = [None] * hidden_layer_size
for i, block in enumerate(model.encoder.layers):
    block.final_layer_norm.register_forward_hook(
        lambda _, inputs, outputs, index=i: hidden_states_hf.__setitem__(index, outputs[-1])
    )

tokens= torch.tensor([[1, 1]]) * model.config.decoder_start_token_id

with torch.no_grad():
    model.eval()
    out = model(mel, decoder_input_ids=tokens, output_attentions=True,
                        output_hidden_states=True)

Topic		Replies	Views
Is it possible to modify the forward behavior of a pre-trained model Models	2	1814	June 19, 2022
Hidden states embedding tensors 🤗Transformers	5	4009	July 22, 2023
GPT2: hidden states get by output_hidden_states is different from those by register_forward_hook 🤗Transformers	0	87	November 19, 2024
Embedding layer or last hidden_hidden_state 🤗Transformers	0	210	November 1, 2023
Whisper fine-tuning without Seq2SeqTrainer Intermediate	0	348	December 15, 2023

Whisper: Forward Hook on final_layer_norm vs out.encoder_hidden_states

Related topics