Common-voice + librispeech

I’ve recently been experimenting with deep learning with wav2vec2 for ASR, I made my first fine-tuning with common-voice in Italian reaching a wer equal to 12%, now I thought I’d take the last checkpoint and continue fine-tuning with librispeech in Italian to further lower the wer. It’s possible?
I take the same notebook I load the librispeech dataset I get the same vocabulary and then I load the model from the last checkpoint: “model = Wav2Vec2ForCTC.from_pretrained (‘model_checkpoint,’ …)”.
I ask this question because I notice that the WER starts at 22% at the first checkpoints … maybe it’s not a correct approach, maybe it was better to stay on common-voice and increase the epoch.

thank you all for your attention.