Hi all,
Can you pass a Wav file to a pre-trained XLSR and get embeddings.
then, later on, depending on a flag, pass that embeddings through different ASR (English/Spanish) layers?
the idea is that I will be having an input, that I am going to perform language identification on and extract the embedding from wav2vec, then depending on the language, I am going to pass it through a selected ASR head.
is that doable?