Batch input for wav2vec2 pretraining

kouohhashi · July 7, 2021, 10:42am

Hi I have a question about how to pad audio when training wav2vec2 model.

The tutorial explains how to handle batch size of one.

input_values = processor(ds["speech"][0], return_tensors="pt").input_values  # Batch size 1
logits = model(input_values).logits
predicted_ids = torch.argmax(logits, dim=-1)

But I think I need to pad shorter audio when batch size is bigger than 2.

Can I pad shorter audio with 0 ?
or is there convenient function for that?

Thanks in advance.

kouohhashi · July 15, 2021, 8:53am

[Found Answer]
It’s going to pad with 0.0. I found it…

Topic		Replies	Views
Processor :: pad Ignores Padding? Beginners	1	768	November 22, 2023
Wav2Vec2ForCTC abandons one logit sometimes Models	1	429	October 26, 2022
Question about Wav2vec2 Models	1	545	May 6, 2022
Wav2Vec2 Processor padding strategy 🤗Transformers	0	326	December 7, 2023
Wav2Vec2 - ValueError: Unable to create tensor, you should probably activate padding with 'padding=True' to have batched tensors with the same length 🤗Transformers	1	481	November 27, 2023

Batch input for wav2vec2 pretraining

Related topics