Hi,
I’m trying to use wav2vec2 for its output feature vectors. My input is audio files, and I don’t want to use any information about its textual content.
This is the model I’m using:
model = Wav2Vec2Model.from_pretrained('facebook/wav2vec2-base')
It seems that a feature extractor
should be defined as well:
feature_extractor = Wav2Vec2Processor.from_pretrained('facebook/wav2vec2-base')
Then, when training:
audio_file = feature_extractor(audio_file, return_tensors="pt", padding=True, feature_size=1, sampling_rate=16000 )
output = model(**audio_file)
What is the feature_extractor needed for? Doesn’t the model itself include a feature extractor?