According to the WavLM paper:
(WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing)
They used ECAPA-TDNN embeddings model for the downstream task: Speaker verification.
I searched but didn’t found, is there any implementation which I can used with the model ?
(WavLM embeddings which produced by ECAPA-TDNN) ?
from transformers import Wav2Vec2FeatureExtractor
from transformers import WavLMForXVector
import soundfile as sf
wav_tensor, sr = sf.read(r"nyfile.wav")
device = "cuda" if torch.cuda.is_available() else "cpu" feature_extractor_wav2vec = Wav2Vec2FeatureExtractor.from_pretrained("microsoft/wavlm-base-plus-sv") model_wav_lm = WavLMForXVector.from_pretrained("microsoft/wavlm-base-plus-sv").to(device) inputs = feature_extractor_wav2vec(wav_tensor,sampling_rate=16000,return_tensors="pt",padding=True).to(device) with torch.no_grad(): embeddings = model_wav_lm(**inputs).embeddings
I didn’t saw if the embeddings came from ECAPA-TDNN or from X-Vector.