Hello
For transformers models it’s best to use model’s built-in loss. For more native Keras implementation and explanation on why it should be that way, you can check out this tutorial. You can just call compile without a loss.
As a convenience, all
Transformers models come with a default loss which matches their output head, although you’re of course free to use your own. Because the built-in loss is computed internally during the forward pass, when using it you may find that some Keras metrics misbehave or give unexpected outputs.