How to reproduce XLNet correctly And What is the config for finetuning XLNet?

I fintune a XLNet for English text classification. But it seems that I did something wrong about it because xlnet-base is worse than bert-base in my case. I set every 1/3 epoch report validation accuracy. At the beginning Bert-base is about 0.50 while XLNet-base is only 0.24. The config I use for xlnet is listed as follows:

config = {
  batch_size = 4,
  learning_rate = 1e-5,
  gradient_accumulation_steps =  32,
  epochs = 4,
  max_sep_length = 384,
  weight_decay = 0.01,
  adam_epsilon = 1e-6,
  16-bit_training = False
}

Does finetune XLNet needs a special setting or XLNet converges slowly?

Thanks for everyone willing to help in advance! :slight_smile: