Default `problem_type`

Hello all!

I am working on classifying power plant outage reports into three target severity classes. I am new to NLP, but from what I was reading, this seems like a straight forward classification task that BERT can assist with.

Initially I was able to get BERT fine tuned to provide predictions on a test set by simply setting my model with the following,
model = BertForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=3)

However after some reading it was not clear to me which loss function was being used with these settings. So I attempted to set model with the following instead,

model = BertForSequenceClassification.from_pretrained(“bert-base-uncased”, num_labels=3,problem_type=“multi_label_classification”)

This results in the following Value error

ValueError                                Traceback (most recent call last)
<ipython-input-33-ddb64c2d7f44> in <module>()
----> 1 output = trainer.train()
      2 print(output)

8 frames
/usr/local/lib/python3.7/dist-packages/torch/nn/ in binary_cross_entropy_with_logits(input, target, weight, size_average, reduce, reduction, pos_weight)
   3129     if not (target.size() == input.size()):
-> 3130         raise ValueError("Target size ({}) must be the same as input size ({})".format(target.size(), input.size()))
   3132     return torch.binary_cross_entropy_with_logits(input, target, weight, pos_weight, reduction_enum)

ValueError: Target size (torch.Size([8])) must be the same as input size (torch.Size([8, 3]))

I have potential a potential solutions regarding the use of .unsqueeze shown here conv neural network - ValueError: Target size (torch.Size([16])) must be the same as input size (torch.Size([16, 1])) - Stack Overflow.

However, before I go implementing this I have some questions that I hope to get feedback on,

  1. What is the default loss function used when I am not using binary_cross_entropy_with_logits as set with problem_type?
  2. Are there repercussions to using unsqueeze? “torch.unsqueeze — PyTorch 1.11.0 documentation
  3. How do I implement .unsqueeze? I see from documentation that I need to pass in my input tensor, but at what stage in the Trainer api is my tensor being created?

Below is my version of transformers I am using through google colab and the code leading up to the error.

- `transformers` version: 4.20.0.dev0
- Platform: Linux-5.4.188+-x86_64-with-Ubuntu-18.04-bionic
- Python version: 3.7.13
- Huggingface_hub version: 0.6.0
- PyTorch version (GPU?): 1.11.0+cu113 (True)
- Tensorflow version (GPU?): 2.8.0 (True)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?: <fill in>
- Using distributed or parallel set-up in script?: <fill in>


! pip install git+
! pip install transformers datasets
! pip install pandas
! pip install comet_ml
! pip install comet_ml --upgrade
! pip install sklearn
! pip install transformers
! pip install --user urllib3==1.25.10
! pip install folium==0.2.1

import comet_ml
import pandas as pd
import numpy as np

from transformers import BertTokenizer, BertForSequenceClassification, TrainingArguments, Trainer
from datasets import load_dataset, dataset_dict, DatasetDict, Dataset, load_metric
from sklearn.model_selection import train_test_split
from pprint import pprint as pp
from sklearn.metrics import precision_recall_curve, roc_curve


experiment = comet_ml.Experiment(

def compute_metrics(eval_pred):
    experiment = comet_ml.get_global_experiment()
    metric0 = load_metric("accuracy")
    metric1 = load_metric("precision")
    metric2 = load_metric("recall")
    metric3 = load_metric("f1")

    logits, labels = eval_pred
    predictions = np.argmax(logits, axis=-1)
    accuracy = metric0.compute(predictions=predictions, references=labels)["accuracy"]
    precision = metric1.compute(predictions=predictions, references=labels, average="macro")["precision"]
    recall = metric2.compute(predictions=predictions, references=labels, average="macro")["recall"]
    f1 = metric3.compute(predictions=predictions, references=labels, average="macro")["f1"]

    experiment.log_confusion_matrix(predictions, labels)

    return {"accuracy":accuracy,"precision": precision, "recall": recall, "f1":f1}

# Reading in of data
df = pd.read_csv('noRefuelingOutagesUnderSampled.csv',usecols=['text','target'])
df = df.dropna()
df = df.rename(columns={'target':'labels'})
df['labels'] = df['labels']
dataset = Dataset.from_pandas(df)

# Tokenize
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")

# Splitting into train, test, validation
train_test = dataset.train_test_split(test_size=0.30) # split dataset
test_valid = train_test['test'].train_test_split(test_size=0.70) # validation 
mainDataset = DatasetDict({
    'train': train_test['train'],
    'test': test_valid['test'],
    'valid': test_valid['train']})

# Wrapper
def tokenize_function(example):
  return tokenizer(example['text'],max_length=256,padding="max_length", truncation=True, add_special_tokens=True)

tokenized_datasets =,batched=True)
tokenized_datasets = tokenized_datasets.remove_columns(['text','__index_level_0__'])

small_train_dataset = tokenized_datasets["train"].shuffle(seed=42)
small_eval_dataset = tokenized_datasets["test"].shuffle(seed=42)
full_train_dataset = tokenized_datasets["train"]
full_eval_dataset = tokenized_datasets["test"]

# Define Model
model = BertForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=3,problem_type="multi_label_classification")

training_args = TrainingArguments("test_trainer",
                                  evaluation_strategy="epoch", # Evaluation is done every epoch
                                  num_train_epochs=N_EPOCHS, # Number of epochs
                                  per_device_train_batch_size=8, # Training Batch size per GPU
                                  per_device_eval_batch_size=32, # Evalution Batch size per GPU

trainer = Trainer(model=model,

output = trainer.train()

I tried being as clear as possible in this post, however I still am rookie. So if I missed to provide some important info please let me know!

Thanks again everyone!

From the look at your error message it looks like you have one label per example, so this is what we called a single label problem (one label per text) problem, not a multi label problem (where each text can have several labels at the same time).

So remove the problem_type argument and you should be fine :slight_smile:

Awesome! Thank you for the feedback!