What is Wav2Vec2FeatureExtractor doing?

bennicholl · April 3, 2022, 8:11pm

I have a very good understanding of traditional transformer architectures for NLP, but I have recently been given a task which requires raw audio.

I understand that tokenizers for BERT map a word to a specific indice, where that indice points to a word vector in a dictionary.

But when I feed raw audio data into Wav2Vec2FeatureExtractor function, where the raw audio data looks like

tensor([ 3.9514e-06,  1.0558e-04, -4.7315e-06,  ...,  3.9716e-04,
         2.4415e-04,  9.1544e-05])

I get back a bunch of float values, which look like
tensor([[-0.0020, -0.0001, -0.0021, ..., 0.0051, 0.0024, -0.0004]])

What are these features that are being generated with Wav2Vec2FeatureExtractor. In NLP, the words are mapped to some vector representation, and that vector is the feature representation. So what are these frequencies from the raw audio being mapped to?

Topic		Replies	Views
What does Wav2Vec2Tokenizer do?and what is the difference between it and Wav2Vec2FeatureExtractor? Beginners	0	298	May 12, 2023
Understanding Wav2vec2Processor Beginners	0	330	December 14, 2021
Do we need to fine-tune Wav2Vec2FeatureExtractor? Beginners	1	249	July 15, 2021
Wav2vec2 using transformers library Beginners	0	277	November 18, 2021
Wav2Vec2 pretraining feature extraction during preprocessing as welll as training 🤗Transformers	1	731	October 1, 2022

What is Wav2Vec2FeatureExtractor doing?

Related topics