Get output embedding of FeatureExtractor

Can anyone route me to get output of Feature extractor out of a transformers model?
Specific case for me here is I want to prepare a voiceprint identification (with n-shot) maybe wav2vec2 embeddings (for auto identification of speaker) , since it is already trained on large data should at least get a proper voice embedding for audio data (Feature extractor is mainly used in finetuning pipeline Wav2Vec2FeatureExtractor).
Especially I want to get a 1x512 or 1x768 embedding before converting that to text mapping.
Maybe it is an overkill for Siamese network but at least may try for smaller version.

Maybe solution is somewhat similar to this (without tokenizer) yet it will probably have no information on pretrained model.

feature_extraction = pipeline('feature-extraction', model="distilroberta-base", tokenizer="distilroberta-base")
features = feature_extraction("i am sentence")

Source: machine learning - Getting sentence embedding from huggingface Feature Extraction Pipeline - Stack Overflow