Understanding Wav2vec2Processor


I’m trying to use wav2vec2 for its output feature vectors. My input is audio files, and I don’t want to use any information about its textual content.

This is the model I’m using:

model = Wav2Vec2Model.from_pretrained('facebook/wav2vec2-base')

It seems that a feature extractor should be defined as well:

feature_extractor = Wav2Vec2Processor.from_pretrained('facebook/wav2vec2-base')

Then, when training:

audio_file = feature_extractor(audio_file, return_tensors="pt", padding=True, feature_size=1, sampling_rate=16000 )
output = model(**audio_file)

What is the feature_extractor needed for? Doesn’t the model itself include a feature extractor?