What do I do when Gradients don't exist?

I’m following this tutorial on fine-tuning a model for text classification using TensorFlow. However, I’m using a custom dataset that I convert to a dataset object using Dataset.from_pandas() and then use prepare_tf_dataset in the following manner before passing these to the model:

tf_train_set = model.prepare_tf_dataset(
batch_size= 16,
collate_fn = collate_func()

tf_validation_set = model.prepare_tf_dataset(
    batch_size = 16,
    collate_fn = collate_func()

I have no issues when I fine-tune BERT and DistilBERT, but as soon as I use XLNet I get the following warning.

WARNING:tensorflow:Gradients do not exist for variables ['tfxl_net_for_sequence_classification/transformer/mask_emb:0'] when minimizing the loss. If you're using `model.compile()`, did you forget to provide a `loss` argument?

I’ve tried specifying binary cross entropy as an argument to the model.compile() method but it makes no difference at all. Each epoch seems to have a different accuracy and loss value, so it would seem that there are some gradients.

Any idea what’s going on?