How to interpret the output of the segmentation model?

laro1 · April 4, 2023, 6:05am

pyannote for speaker diarization based on the following segmentation model:
End-to-end speaker segmentation for overlap-aware resegmentation

In the above paper they wrote, under the Implementation details:

model input: sequences of 80000 samples
[i.e: 5s audio chunks with a sampling rate of 16kHz]
model output:
K max -dimensional speaker activations between 0 and 1 every 16ms.

Does it means that the output shape is (K, 5000/16) ?
The output values are between 0 and 1. how to interpret it ?
How to conclude if we have a new segment or number of segments in each output ? number of speaker in output ? (example will be very helpful)

Topic		Replies	Views
Speaker Diarization Models	0	90	December 2, 2024
What are 'min_duration_off' and 'threshold' means (segmentation) Models	1	952	September 19, 2023
Diarization with unknown number of speakers Models	1	1589	October 28, 2022
What are the parameters the pyannote embedding model was trained on? Models	0	515	August 6, 2023
Pyannote/speaker-diarization-3.1 recognising a particular speech Beginners	1	43	May 5, 2025