The wav2vec2.0 base 960h model never seems to return a beginning of sentence or end of sentence token (or ’ or unknown, so far)–using greedy decoding. Is that expected? I can’t seem to find this discussed anywhere. Or am I just feeding in audio that is too difficult for the model to determine the eos/bos? If so, can someone provide a counter-example?
Related topics
Topic | Replies | Views | Activity | |
---|---|---|---|---|
Wav2vec2 results vary depending on far away prefix len | 0 | 184 | September 30, 2023 | |
Ideas to correct Wav2Vec2 transcription results | 1 | 977 | May 11, 2021 | |
Using Padding for ASR models | 0 | 323 | December 16, 2022 | |
Wav2vec2 not converging when finetuning | 7 | 2332 | June 15, 2021 | |
Wav2vec2 finetuning and language model | 0 | 199 | October 1, 2023 |