What are 'min_duration_off' and 'threshold' means (segmentation)

printing the pipeline parameters of pyannote.audio (speaker-diarization)
(pipeline.parameters(instantiated=True)) gives:

{
'segmentation':
      {
      'min_duration_off': 0.5817029604921046,
      'threshold': 0.4442333667381752
     },
'clustering':
   {
     'method': 'centroid',
     'min_cluster_size': 15,
   'threshold': 0.7153814381597874
 }
}

I read the article of the segmentation model (End-to-end speaker segmentation for overlap-aware resegmentation)
and still don’t understand, what is the meaning of min_duration_off and threshold ?

min_duration_on - remove speech regions shorter than that many seconds.
min_duration_off - fill non-speech regions shorter than that many seconds.

1 Like