How to interpret the output of the segmentation model?

pyannote for speaker diarization based on the following segmentation model:
End-to-end speaker segmentation for overlap-aware resegmentation

In the above paper they wrote, under the Implementation details:

  • model input: sequences of 80000 samples
    [i.e: 5s audio chunks with a sampling rate of 16kHz]
  • model output:
    K max -dimensional speaker activations between 0 and 1 every 16ms.
  1. Does it means that the output shape is (K, 5000/16) ?
  2. The output values are between 0 and 1. how to interpret it ?
    How to conclude if we have a new segment or number of segments in each output ? number of speaker in output ? (example will be very helpful)