Casual LM on GLUE dataset

I am able to benchmark masked LM, like bert, deberta, roberta, on Glue dataset using the provided in transformers.
However, when I try to change the model to a Casual LM model, the training does not progress well, the training loss immediately goes to zero.
I suspect there is a difference in the loss function. Can this be done?