What are the parameters the pyannote embedding model was trained on?

laro1 · August 6, 2023, 8:00am

According to the paper they trained on audio files with sample rate of 8K.
The model is a classification model with final layer of N (N - different speakers)
The input size was up to 3 seconds length with frame length of 25ms

What is the value of N ? (I didn’t found it in the paper)
According to the frame-length (i.e 25ms) there is no need to be an assumption on the speech length ? (am I right ?) (we can get embedding vector for different speech lengths) ?
If the model trained on speeches with SR of 8K, Do I need to resample any speech to 8K before getting it’s embedding vector ?

Topic		Replies	Views
How the embedding model (x-vectors) trained? Models	0	925	March 30, 2023
Bemba ASR: Fine-Tuning Wav2Vec2 Languages at Hugging Face	4	750	March 21, 2021
Get output embedding of FeatureExtractor 🤗Transformers	1	713	April 20, 2021
WavLM ECAPA-TDNN embeddings for Speaker verification Models	0	602	November 19, 2023
Embedding for my own voice Beginners	0	140	June 25, 2023