Batch input for wav2vec2 pretraining

Hi I have a question about how to pad audio when training wav2vec2 model.

The tutorial explains how to handle batch size of one.

input_values = processor(ds["speech"][0], return_tensors="pt").input_values  # Batch size 1
logits = model(input_values).logits
predicted_ids = torch.argmax(logits, dim=-1)

But I think I need to pad shorter audio when batch size is bigger than 2.

Can I pad shorter audio with 0 ?
or is there convenient function for that?

Thanks in advance.

1 Like

[Found Answer]
It’s going to pad with 0.0. I found it…