Hi I have a question about how to pad audio when training wav2vec2 model.
The tutorial explains how to handle batch size of one.
input_values = processor(ds["speech"][0], return_tensors="pt").input_values # Batch size 1
logits = model(input_values).logits
predicted_ids = torch.argmax(logits, dim=-1)
But I think I need to pad shorter audio when batch size is bigger than 2.
Can I pad shorter audio with 0 ?
or is there convenient function for that?
Thanks in advance.