Can i use Transformer-XL for text classification task?

I want to use transformer xl for text classification tasks. But I don’t know the architect model for the text classification task. I use dense layers with activation softmax for logits output from the transformer xl model, but this doesn’t seem right. when training I see the loss not reduce and accuracy is very low.
I was build, training this form scratch with imdb dataset
My training step:

My logits: