Resources on interpretability of wav2vec-style speech models

arsenious · September 12, 2022, 11:53pm

Hello everyone

Big thanks to HuggingFace for creating this amazing framework, and the active community as well! I’ve been using huggingface for a while now and been reading this forum as well.

I am working on multi-lingual speech models and am interested in understanding how the pre-trained wav2vec-style models represent input utterances (from a phonetics perspective if possible). For example, I would like to know how Language Identification Model like “VoxLingua107 Wav2Vec Spoken Language Identification Model” goes about representing a collection of short utterances in English vs. say Thai.

The most straight-forward method I know is to take final layer output embeddings (in inference mode) and to use t-SNE to cluster. But this doesn’t seem to help as much.

I am looking for literature, codes, frameworks (like Captum) and tutorials which use wav2vec-style models and focus on interpretability. Please help. Thank you

Topic		Replies	Views
Train and inference wav2vec2 using a language model Intermediate	1	681	May 2, 2021
Wav2vec2 inference on my own model Beginners	0	374	November 18, 2021
Wav2Vec2 For Swedish 🤗Transformers	6	953	March 17, 2021
A hypothetical question on multi-headed wav2vec2 / hubert models 🤗Transformers	0	345	December 15, 2021
Nahuatl: Fine-Tuning Wav2Vec Languages at Hugging Face	11	1095	May 3, 2021

Resources on interpretability of wav2vec-style speech models

Related topics