Audio event embeddings from existing pretrained transformer models

sriniu · March 18, 2023, 11:59am

I want to use existing pretrained audio transformer models as audio event embedding extractor. This way I can generate latent feature representations of my input audio event (say time domain audio or spectrogram).
All examples in the hugging face is either to do inferencing on a given audio or fine tune the transformer based classifier.

Any links to examples where we get embeddings (encoder outputs) , which are the latent space representations of the input before its used in the classifier?

@reach-vb @osanseviero any leads would be helpful.

Topic		Replies	Views
Audio feature extraction in the browser Beginners	0	91	May 24, 2024
Multimodal architectures with HuggingFace transformers for speech and text 🤗Transformers	3	1132	November 14, 2022
Building an variational autoencoder with transformers Beginners	1	704	March 17, 2024
Can an EncoderModel be trained on top of a concatenation of BertModel [CLS] embeddings with additional input data using the transformers library? Intermediate	0	446	December 9, 2022
Separate pre-trained encoder and decoder Models	0	437	October 4, 2023

Audio event embeddings from existing pretrained transformer models

Related topics