Retrain Electra model with different embedings from scratch

Hello all,

I have implemented Electra model with a different (ELMO-based) embedings. I have based my code on BERT version from GitHub - helboukkouri/character-bert: Main repository for "CharacterBERT: Reconciling ELMo and BERT for Word-Level Open-Vocabulary Representations From Characters". However, I have an issue with training. If I train pure Electra from scratch, all is working correctly and loss is decreasing.
However, when I try to train the new version, loss is stucked and oscilate (the same happend also with original BERT). I have tried to train ELMO separatelly with a different network and it seems to be working as it should.
Any idea how to debug this, or how to find what is wrong?

Thank you all