Fine-tune wav2vec2-large-xlsr-53 for one epoch

Hi, I am preparing to fine-tune the wav2vec-large-xlsr-53 checkpoint for English and soon I will run the code on a cluster. The reason why I do that is because in the first iteration we try to reproduce the training for a language with a lot of data and then we will move to a language with lower amount of data.

I tried to run the model for less than 1 epoch because google Colab trains really slowly unless you have the Pro version and I wanted to make sure that the code works properly before moving it in the cluster. I tried to test the model just to check the batch decoder and the predicted logits resulted into a decoded empty string.

I tried this experiment on both an English dataset and an Irish dataset. The function I used to evaluate the results is not the problem as I tested it with a pre-trained model and it works fine.

def map_to_result(batch):
  with torch.no_grad():
    input_values = torch.tensor(batch["input_values"], device="cuda").unsqueeze(0)
    logits = model(input_values).logits

  pred_ids = torch.argmax(logits, dim=-1)
  batch["pred_str"] = processor.batch_decode(pred_ids)[0]
  batch["text"] = processor.decode(batch["labels"], group_tokens=False)
  return batch

I wanted to know if this behavior happens because I didn’t train the model well enough or if there is a problem with either my tokenizer or the saved model. Could somebody clarify it for me?