Diarization with unknown number of speakers

Ollie · April 28, 2022, 11:49am

Hi there!

I’m looking into audio diarization but with the caveat that the number of speakers is not known beforehand. This means ruling out models that need the number of speakers such as Wav2Vec2ForAudioFrameClassification.

My approach was to use Wav2Vec2ForXVector on each audio snippet and use agglomerative clustering to cluster the vectors using cosine similarity and some value for distance_threshold. Although the results on the training data were good (confusion matrix, etc.) the problem is that the results are very sensitive to the threshold value, to the point where the approach doesn’t generalize very well at all.

Has anyone attempted this problem and obtained more stable results?

Thanks!

smarterbizomkar · October 28, 2022, 6:06pm

Hi,

Can you try this model - GitHub - pyannote/pyannote-audio: Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

Topic		Replies	Views
Speaker Diarization Models	0	82	December 2, 2024
Pyannote/speaker-diarization-3.1 recognising a particular speech Beginners	1	31	May 5, 2025
How to interpret the output of the segmentation model? Models	0	236	April 4, 2023
Issue with Using pyannote/speaker-diarization Gated Model in Colab and API Beginners	3	159	January 9, 2025
Speaker diarization with Whisper? Beginners	1	5064	January 31, 2023

Diarization with unknown number of speakers

Related topics