Custom loss function forward vs. custom_loss

According to the documentation the proper way of implementing a custom loss function is by defining the custom_loss method of the Trainer class: Trainer — transformers 4.0.0 documentation

Other sources suggest to inherit from nn.Module and reimplement the forward function: deep learning - Implementation of Focal loss for multi label classification - Stack Overflow

Could someone clarify what the difference is in using the custom_loss and the forward method to implement a custom loss function and how are they connected to each other?


It depends on the way you’re training your model. In case you use the Trainer API, then you need to overwrite the compute_loss method.

If you’re training with native PyTorch, or a framework like HuggingFace Accelerate, then you can define the custom loss in the model’s forward method. You can then train the model as follows (assuming the forward method returns a tuple of logits and loss):

logits, loss = model(inputs, labels=labels)

I defined the forward function and used the Trainer API, and it looks like it’s using the loss from the forward function, does it make sense?