Get output embedding of FeatureExtractor

gorkemgoknar · April 20, 2021, 1:14pm

Can anyone route me to get output of Feature extractor out of a transformers model?
Specific case for me here is I want to prepare a voiceprint identification (with n-shot) maybe wav2vec2 embeddings (for auto identification of speaker) , since it is already trained on large data should at least get a proper voice embedding for audio data (Feature extractor is mainly used in finetuning pipeline Wav2Vec2FeatureExtractor).
Especially I want to get a 1x512 or 1x768 embedding before converting that to text mapping.
Maybe it is an overkill for Siamese network but at least may try for smaller version.
Thanks.

gorkemgoknar · April 20, 2021, 1:22pm

Maybe solution is somewhat similar to this (without tokenizer) yet it will probably have no information on pretrained model.

feature_extraction = pipeline('feature-extraction', model="distilroberta-base", tokenizer="distilroberta-base")
features = feature_extraction("i am sentence")

Source: machine learning - Getting sentence embedding from huggingface Feature Extraction Pipeline - Stack Overflow

Topic		Replies	Views
Extracting token embeddings from pretrained language models Beginners	9	22182	May 2, 2024
Extracting embeddings with distilbert? (in tensorflow) 🤗Transformers	5	3011	August 6, 2021
Easiest way to get a senetence embedder from a transformers model? 🤗Transformers	1	1378	April 7, 2022
Getting pretrained embeddings 🤗Transformers	0	599	June 20, 2023
Extracting sentence embeddings from NLP models from each layer seperately Beginners	0	720	August 18, 2021

Get output embedding of FeatureExtractor

Related topics