Hi,
I’ve been benchmarking WavLMForXVector on LibriSpeech data and the result I get is EER = 4.7% while the WavLM paper (table II) quotes EER = 0.84% for WavLM Base+.
I used the example code from the docs (WavLM), but loading the data from a hard drive with the soundfile library. I also noticed that the example code seems to be missing the adaptive s-norm component that they used in the paper but I wonder if this would be enough to cause the performance to worsen so much.
Any ideas what I’m getting wrong?