I would like to know the implementation of the xlnet-large-cased model in tensorflow.
Mainly, I would like to know, how to get the output of the last layer of xlnet (with grad) in tensorflow. (the pytorch implementation for the same is given but not the tensorflow)