I think maybe that’s because in the paper they add a transformer LM to improve the performance. By the way, I am wondering if there is any discussion about how to combine a transformer LM with huggingface’s wav2vev2 model as well. I’ve found one blog article written by @patrickvonplaten showing how to boost wav2vec2 with n-gram LM, but currently I don’t know how to combine the model with a transformer LM.
1 Like