Creating distillated version of gelectra-base model

Hello all, i am trying to create distill version of gelectra-base model. For training a student model optimizer has to be defined, as per paper i used Adam optimizer but the losses are not looking good. So anyone have idea about optimizer.