Wav2Vec2 Hidden States

RajSang · February 27, 2023, 11:33pm

@sanchit-gandhi , when we set output_hidden_states=True for the Wav2Vec2 model, we get 13 tensors, where 12 correspond to the outputs from each Encoder Layer. What is the very first output tensor? In the BERT model this corresponds to the output from the embedding layer. In Wav2Vec2 is this the output of the feature extractor projected into some space in combination with positional information?

Best,
Raj

sanchit-gandhi · March 3, 2023, 4:19pm

Hey @RajSang! Great question! That’s exactly right: the first hidden state is the output of the CNN layers with an added positional embedding, i.e. the latent speech representations that we pass into the first transformer layer:

It’s this hidden state in the code: transformers/modeling_wav2vec2.py at 8c40ba73d8091ebe0bdc8da5b634bf7951d18f99 · huggingface/transformers · GitHub

Hope that answers your question!

RajSang · March 12, 2023, 10:05am

Thanks a lot Sanchit!

Topic		Replies	Views
Hidden states embedding tensors 🤗Transformers	5	4026	July 22, 2023
Get last embedding layer from wav2vec Beginners	0	131	February 22, 2024
Getting embeddings from wav2vec2 models Beginners	2	1419	October 20, 2023
Question about last_hidden_state of the bert model Beginners	0	332	December 7, 2023
Is last_hidden_state the output of Encoder block? Beginners	1	446	December 23, 2021

Wav2Vec2 Hidden States

Related topics