How the embedding model (x-vectors) trained?

laro1 · March 30, 2023, 7:42am

I read this paper: X-Vectors: Robust DNN Embeddings for Speaker Recognition which describes how PyAnnote embedding block works.

I’m not sure I understand how the X-Vector model was trained and tested:

According to the paper, there is a DNN that was trained with N speakers (as classification task).
Here’s the model:

For embedding vectors, they exclude the last 2 layers of the DNN.
They used LDA to reduce the dimensions of the embedding from 512 to 150 and run PLDA model.

What is the “total context” in the model ?
If they trained the DNN model with N-Speakers classification task, why do they need to run LDA + PLDA ?
Are there any learn parameters on the LDA + PLDA step ?
In order to produce an embedding vector, do we also need to run the LDA+PLDA step ?

Topic		Replies	Views
What are the parameters the pyannote embedding model was trained on? Models	0	518	August 6, 2023
Speaker Diarization Models	0	90	December 2, 2024
Getting embeddings from wav2vec2 models Beginners	2	1413	October 20, 2023
WavLM ECAPA-TDNN embeddings for Speaker verification Models	0	573	November 19, 2023
Diarization with unknown number of speakers Models	1	1590	October 28, 2022