I fintune a XLNet for English text classification. But it seems that I did something wrong about it because xlnet-base is worse than bert-base in my case. I set every 1/3 epoch report validation accuracy. At the beginning Bert-base is about 0.50 while XLNet-base is only 0.24. The config I use for xlnet is listed as follows:
config = {
batch_size = 4,
learning_rate = 1e-5,
gradient_accumulation_steps = 32,
epochs = 4,
max_sep_length = 384,
weight_decay = 0.01,
adam_epsilon = 1e-6,
16-bit_training = False
}
Does finetune XLNet needs a special setting or XLNet converges slowly?
Thanks for everyone willing to help in advance!