I am running distilbert-base-multilingual-cased’ on Pytorch. My model has 4 classes in the target.

In the code in models/distilbert/modeling_distilbert.py. I am reaching this state
elif self.config.problem_type == “multi_label_classification”:
** loss_fct = BCEWithLogitsLoss()**
** loss = loss_fct(logits, labels)**

I have two questions:
1 BCEWithLogitsloss must receive the labels as one hot rather integers . must I take care of doing one_hot?
2 If I wish to add regulation functions to the loss , what is the best practice for doing so?


Note that multi_label_classification is only for problems where you can have multiple labels for one example, so you should use the default if your samples can only have one label.
If you are in a true multiple label problem, then it’s very likely your labels are already in a one-hot format.

For your second question, you should just output the logits of your model and then compute the loss manually with your penalty. If you’re using the Trainer API, you can subclass and write a compute_loss function with that, see here for an example.

Thanks for the quick answer

Q2 is pretty clear
Q1. - I realized that I gave labels of int64. As we begin to train , in data_loader.py we have the block
if “label” in first and first[“label”] is not None:
** label = first[“label”].item() if isinstance(first[“label”], torch.Tensor) else first[“label”]**
** dtype = torch.long if isinstance(label, int) else torch.float**
Since is instance(label, int) for int64 is False , it converted the labels to Float. Which causes to multi_label_classifcation due to the following block

elif self.num_labels > 1 and (labels.dtype == torch.long or labels.dtype == torch.int):
** self.config.problem_type = “single_label_classification”**
** else:**
** self.config.problem_type = “multi_label_classification”**

The corollary is that one has to give labels as int and not int64