I’m trying to use pyannote
for speaker diarization
and I’m getting wrong number of speakers.
Any example I tried I got wrong results.
For example:
-
I used this youtube file:
https://www.youtube.com/watch?v=b2_ZZ2UpSzI -
I convert it to wav file with sample rate of 16000.
I run the following code:
from pyannote.audio import Pipeline
TEST_FILE = "example.wav"
MY_TOKEN = "..."
pipeline = Pipeline.from_pretrained("pyannote/speaker-diarization",
use_auth_token=MY_TOKEN)
diarization = pipeline(TEST_FILE)
And I got the following diarization:
- The GT contains 4 speakers and not 2.
How can I tweak pyannote
and get better results ?