According to the documentation the proper way of implementing a custom loss function is by defining the custom_loss method of the Trainer class: Trainer — transformers 4.0.0 documentation
Could someone clarify what the difference is in using the custom_loss and the forward method to implement a custom loss function and how are they connected to each other?
It depends on the way you’re training your model. In case you use the Trainer API, then you need to overwrite the compute_loss method.
If you’re training with native PyTorch, or a framework like HuggingFace Accelerate, then you can define the custom loss in the model’s forward method. You can then train the model as follows (assuming the forward method returns a tuple of logits and loss):
logits, loss = model(inputs, labels=labels)
loss.backward()