Hello Guys,
i want to Finetune a Wav2Vec Modell, but i cant pad my Inputs.
One Entry of my dataset looks something like this:
{'path': '/share/datasets/voxforge_todo/anonymhatschie-20140526-wth/wav/de3-47.wav',
'sentence': 'das netzwerkkabel weist die notwendige qualität auf für schnelle datenübertragung',
'sampling_rate': 16000,
'speech': array([-0.00012025, -0.00104889, -0.00085019, ..., -0.00066073,
-0.00037126, -0.00053336], dtype=float32),
'input_values': tensor([[-0.0019, -0.0166, -0.0134, ..., -0.0104, -0.0059, -0.0084]]),
'labels': tensor([[ 58, 54, 24, 37, 151, 51, 18, 43, 52, 51, 47, 61, 61, 54,
27, 51, 110, 37, 52, 51, 39, 24, 18, 37, 58, 39, 51, 37,
151, 67, 18, 52, 51, 151, 58, 39, 60, 51, 37, 30, 104, 54,
110, 39, 18, 17, 18, 37, 54, 104, 19, 37, 19, 23, 47, 37,
24, 113, 92, 151, 51, 110, 110, 51, 37, 58, 54, 18, 51, 151,
23, 27, 51, 47, 18, 47, 54, 60, 104, 151, 60]])}
Where speech
is the Audiofile read in by torchaudio::load
, the target transcription is sentence
and
batch["input_values"] = self.processor(batch["speech"], return_tensors="pt",
sampling_rate=16_000, padding=True).input_values
batch["labels"] = self.processor(batch["sentence"], return_tensors="pt", padding=True).input_ids
If i use this Dataset as Dataset to Finetune the maxidl/wav2vec2-large-xlsr-german
Model im getting the Following Exception:
ValueError: Unable to create tensor, you should probably activate padding with 'padding=True' to have batched tensors with the same length.
This Exception is caused by the following self.process.pad
call within the
DataCollatorCTCWithPadding
Class as Copied from: https://colab.research.google.com/github/patrickvonplaten/notebooks/blob/master/Fine_Tune_XLSR_Wav2Vec2_on_Turkish_ASR_with_🤗_Transformers.ipynb#scrollTo=lbQf5GuZyQ4_
def __call__(self, features: List[Dict[str, Union[List[int], torch.Tensor]]]) -> Dict[str, torch.Tensor]:
# split inputs and labels since they have to be of different lenghts and need
# different padding methods
input_features = [{"input_values": feature["input_values"]} for feature in features]
label_features = [{"input_ids": feature["labels"]} for feature in features]
batch = self.processor.pad(
input_features,
padding=self.padding,
max_length=self.max_length,
pad_to_multiple_of=self.pad_to_multiple_of,
return_tensors="pt",
)
To Reproduce my Problem with Minimum Amount of Code i did this:
from transformers import Wav2Vec2Processor
model_name = 'maxidl/wav2vec2-large-xlsr-german'
processor = Wav2Vec2Processor.from_pretrained(model_name)
fake_batch = [{'input_values' : torch.rand([1, 112000])}, {'input_values' : torch.rand([1, 86000])}]
processor.pad(fake_batch, padding=True)
And im still getting the following Exception, even tho i passed padding=True
ValueError: Unable to create tensor, you should probably activate padding with 'padding=True' to have batched tensors with the same length.
Maybe you guys can help me
Ty in advance