Speech Encoder Decoder training

Hello, I’m trying to train the new speech encoder decoder model using wav2vec2 as encoder and a Bert as decoder.
I did’t find any tutorial so a tried to adapt a wav2vec2 training script, but I’m having this error when I try to run the training:

KeyError Traceback (most recent call last)

in () ----> 1 trainer.train()

3 frames

[/usr/local/lib/python3.7/dist-packages/transformers/file_utils.py] in getitem(self, k) 1934 if isinstance(k, str): 1935 inner_dict = {k: v for (k, v) in self.items()} → 1936 return inner_dict[k] 1937 else: 1938 return self.to_tuple()[k]

KeyError: ‘loss’

the code is also available here: Google Colab

Any Idea on how to fix this? :sweat_smile: