RuntimeError: blank must be in label range

Thanish · March 23, 2021, 8:15am

Hi, I’m trying to train a tamil model. I ran the code as explained in Patrick’s video. But I ran into this error. Can you help me what the reason for this?

This is my colab notebook

patrickvonplaten · March 23, 2021, 8:37am

Here a shareable link of the notebook: https://colab.research.google.com/drive/1SSmJywEvx07TtQSRtSFpxRawzb1lnXIC?usp=sharing

patrickvonplaten · March 23, 2021, 9:06am

The colab seems to work fine with me - it’s training when I run it: https://colab.research.google.com/drive/1NCoaTUx1ntjwO1ZgdvM0tlPFehBTBp7t?usp=sharing

mcg · April 9, 2021, 9:54am

you shall set the vocab_sizee in Wav2Vec2ForCTC.from_pretrained()

skylord · April 9, 2021, 3:14pm

This also happens if the token you have selected is part of the language vocab. In hindi (or other devnagari scripts) the pipe "|" is used instead of a full-stop. So be careful to select a token which is not part of the normal language vocab

Shiro · July 16, 2021, 3:07pm

@patrickvonplaten Hi, I also have this issue, and it seems related to the new vocab size larger to the vocab size during pretraining. I suppose the model is reusing the weights of pretrained model (lm_head layer ) . is there a simple way to update the dimension (similar to model.resize_token_embeddings from language modeling model) or the method is only manually such as model.lm_head = nn.Linear(…) ?

noetits · September 9, 2021, 2:23pm

For posterity, the vocab size is set there in the last parameter and should be the number of different characters. It is correct in the original notebook, I don’t know if it was corrrected…

model = Wav2Vec2ForCTC.from_pretrained(
    "facebook/wav2vec2-large-xlsr-53", 
    attention_dropout=0.1,
    hidden_dropout=0.1,
    feat_proj_dropout=0.0,
    mask_time_prob=0.05,
    layerdrop=0.1,
    gradient_checkpointing=True, 
    ctc_loss_reduction="mean", 
    pad_token_id=processor.tokenizer.pad_token_id,
    vocab_size=len(processor.tokenizer)
)

ying-tina · August 14, 2022, 2:50pm

Hi @Shiro,

I encountered the same issue. Did you solve this problem?

Thank you in advance and looking forward to hearing from you.

Best regards

Topic		Replies	Views
Making predictions in Boosting wav2vec2 with n-grams Models	2	414	October 25, 2022
Fine Tuning IMDb tutorial - Unable to reproduce and adapt Beginners	19	8595	August 21, 2020
[HELP] RuntimeError: CUDA error: device-side assert triggered Beginners	20	53696	October 23, 2024
Fine-tune wav2vec2-large-xlsr-53 for one epoch 🤗Transformers	0	432	January 11, 2022
Finetunig of wav2vec2-xls-r-300m outputs invalid words for Bengali data Models	6	684	February 1, 2023

RuntimeError: blank must be in label range

Related topics