Single embedding from single wav file for wav2vec models?

underdogliu1005 · September 29, 2023, 2:07am

I am not sure if here is the right channel to ask.

I am new to wav2vec models and aware that wav2vec usually acts as a “frontend” model so we gotta have embeddings or features from them. I used the script below to produce embeddings for future use. The output from a single wav file is [1, 212, 1024] for hidden states and [1, 212, 512] for features.

If I wanna have a single one-dimensional embedding (in either 1024 or 512 dim), would simple averaging be a valid solution?

Source of the code: python - Getting embeddings from wav2vec2 models in HuggingFace - Stack Overflow

import librosa
import torch
from transformers import Wav2Vec2FeatureExtractor, Wav2Vec2Model

input_audio, sample_rate = librosa.load("/content/test.wav",  sr=16000)

model_name = "facebook/wav2vec2-large-960h"
feature_extractor = Wav2Vec2FeatureExtractor.from_pretrained(model_name)
model = Wav2Vec2Model.from_pretrained(model_name)

i= feature_extractor(input_audio, return_tensors="pt", sampling_rate=sample_rate)
with torch.no_grad():
  o = model(i.input_values)
print(o.keys())
print(o.last_hidden_state.shape)
print(o.extract_features.shape)

Topic		Replies	Views
Getting embeddings from wav2vec2 models Beginners	2	1421	October 20, 2023
Get last embedding layer from wav2vec Beginners	0	131	February 22, 2024
How to extract embeddings in Wav2Vec2? Beginners	0	433	April 29, 2022
Can you use the Same embeddings of Wav2Vec XLSR and apply different ASR heads? Beginners	0	239	June 2, 2022
Wav2vec2 feature timestamps? (not words) Models	1	561	February 16, 2022

Single embedding from single wav file for wav2vec models?

Related topics