Explanation about VIDEOMAE loss function for multilabel fine-tuning

In modeling_videomae.py, the loss calculation consists of this code:

loss = None
if labels is not None:
if self.config.problem_type is None:
if self.num_labels == 1:
self.config.problem_type = “regression”
elif self.num_labels > 1 and (labels.dtype == torch.long or labels.dtype == torch.int):
self.config.problem_type = “single_label_classification”
else:
self.config.problem_type = “multi_label_classification”

if self.config.problem_type == "regression":
    loss_fct = MSELoss()
    if self.num_labels == 1:
        loss = loss_fct(logits.squeeze(), labels.squeeze())
    else:
        loss = loss_fct(logits, labels)
elif self.config.problem_type == "single_label_classification":
    loss_fct = CrossEntropyLoss()
    loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1))
elif self.config.problem_type == "multi_label_classification":
    loss_fct = BCEWithLogitsLoss()
    loss = loss_fct(logits, labels)

I don’t get it. I’m trying to solve a multilabel classification problem, and my labels are:

tensor([[0, 1, 1],
[1, 0, 0]], device=‘cuda:0’)

So because my labels dtype is int, it sets my problem type to “single_label_classification” instead of “multi_label_classification” (and than CrossEntropyLoss throws a dimension mismatch exception)

Is it correct? a multilabel classification should also use int labels right? Should I change my labels dtype to float?

Thanks!