Fine-tune Wav2Vec2ForCTC from pre-finedtuned model

I have a Wav2Vec2ForCTC model that has been fine-tuned and it is doing very well. I want to fine-tune that model on a small dataset of audio.
So, my problem is fine-tuning a pre-fine-tuned model. The original model was fine-tuned on wav2vec2-large-xlsr-53.

Here is what I tried:

import torch 
model = Wav2Vec2ForCTC.from_pretrained("facebook/wav2vec2-large-xlsr-53", vocab_size=processor.tokenizer.vocab_size)
model.load_state_dict(torch.load('./wav2vec2model/pytorch_model.bin'))

Then I created the TrainingArguments and set the trainer. The problem is that the model performance after training for a couple of steps is very poor, while I was expecting it to be better as it loaded all the load_state_dict from the fine-tuned version.

Can you help ?