Default `problem_type`

kasticrunch · May 23, 2022, 7:57pm

Hello all!

I am working on classifying power plant outage reports into three target severity classes. I am new to NLP, but from what I was reading, this seems like a straight forward classification task that BERT can assist with.

Initially I was able to get BERT fine tuned to provide predictions on a test set by simply setting my model with the following,
model = BertForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=3)

However after some reading it was not clear to me which loss function was being used with these settings. So I attempted to set model with the following instead,

model = BertForSequenceClassification.from_pretrained(“bert-base-uncased”, num_labels=3,problem_type=“multi_label_classification”)

This results in the following Value error

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-33-ddb64c2d7f44> in <module>()
----> 1 output = trainer.train()
      2 print(output)

8 frames
/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py in binary_cross_entropy_with_logits(input, target, weight, size_average, reduce, reduction, pos_weight)
   3128 
   3129     if not (target.size() == input.size()):
-> 3130         raise ValueError("Target size ({}) must be the same as input size ({})".format(target.size(), input.size()))
   3131 
   3132     return torch.binary_cross_entropy_with_logits(input, target, weight, pos_weight, reduction_enum)

ValueError: Target size (torch.Size([8])) must be the same as input size (torch.Size([8, 3]))

I have potential a potential solutions regarding the use of .unsqueeze shown here conv neural network - ValueError: Target size (torch.Size([16])) must be the same as input size (torch.Size([16, 1])) - Stack Overflow.

However, before I go implementing this I have some questions that I hope to get feedback on,

What is the default loss function used when I am not using binary_cross_entropy_with_logits as set with problem_type?
Are there repercussions to using unsqueeze? “torch.unsqueeze — PyTorch 1.11.0 documentation”
How do I implement .unsqueeze? I see from documentation that I need to pass in my input tensor, but at what stage in the Trainer api is my tensor being created?

Below is my version of transformers I am using through google colab and the code leading up to the error.

- `transformers` version: 4.20.0.dev0
- Platform: Linux-5.4.188+-x86_64-with-Ubuntu-18.04-bionic
- Python version: 3.7.13
- Huggingface_hub version: 0.6.0
- PyTorch version (GPU?): 1.11.0+cu113 (True)
- Tensorflow version (GPU?): 2.8.0 (True)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?: <fill in>
- Using distributed or parallel set-up in script?: <fill in>

CODE

! pip install git+https://github.com/huggingface/transformers.git
! pip install transformers datasets
! pip install pandas
! pip install comet_ml
! pip install comet_ml --upgrade
! pip install sklearn
! pip install transformers
! pip install --user urllib3==1.25.10
! pip install folium==0.2.1

import comet_ml
import pandas as pd
import numpy as np

from transformers import BertTokenizer, BertForSequenceClassification, TrainingArguments, Trainer
from datasets import load_dataset, dataset_dict, DatasetDict, Dataset, load_metric
from sklearn.model_selection import train_test_split
from pprint import pprint as pp
from sklearn.metrics import precision_recall_curve, roc_curve


comet_ml.init()

experiment = comet_ml.Experiment(
   project_name="confusion-matrix", 
)

N_CLASSES = 3
N_EPOCHS = 24
def compute_metrics(eval_pred):
    experiment = comet_ml.get_global_experiment()
    metric0 = load_metric("accuracy")
    metric1 = load_metric("precision")
    metric2 = load_metric("recall")
    metric3 = load_metric("f1")


    logits, labels = eval_pred
    print(logits)
    predictions = np.argmax(logits, axis=-1)
    accuracy = metric0.compute(predictions=predictions, references=labels)["accuracy"]
    precision = metric1.compute(predictions=predictions, references=labels, average="macro")["precision"]
    recall = metric2.compute(predictions=predictions, references=labels, average="macro")["recall"]
    f1 = metric3.compute(predictions=predictions, references=labels, average="macro")["f1"]

    experiment.log_confusion_matrix(predictions, labels)
    experiment.log_metric("accuracy",accuracy,epoch=N_EPOCHS)
    experiment.log_metric("precision",precision,epoch=N_EPOCHS)
    experiment.log_metric("recall",recall,epoch=N_EPOCHS)
    experiment.log_metric("f1",f1,epoch=N_EPOCHS)
    print(predictions,labels)

    return {"accuracy":accuracy,"precision": precision, "recall": recall, "f1":f1}

# Reading in of data
df = pd.read_csv('noRefuelingOutagesUnderSampled.csv',usecols=['text','target'])
df = df.dropna()
df = df.rename(columns={'target':'labels'})
df['labels'] = df['labels']
dataset = Dataset.from_pandas(df)

# Tokenize
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")

# Splitting into train, test, validation
train_test = dataset.train_test_split(test_size=0.30) # split dataset
test_valid = train_test['test'].train_test_split(test_size=0.70) # validation 
mainDataset = DatasetDict({
    'train': train_test['train'],
    'test': test_valid['test'],
    'valid': test_valid['train']})
print(train_test)

# Wrapper
def tokenize_function(example):
  return tokenizer(example['text'],max_length=256,padding="max_length", truncation=True, add_special_tokens=True)

tokenized_datasets = mainDataset.map(tokenize_function,batched=True)
tokenized_datasets = tokenized_datasets.remove_columns(['text','__index_level_0__'])

small_train_dataset = tokenized_datasets["train"].shuffle(seed=42)
small_eval_dataset = tokenized_datasets["test"].shuffle(seed=42)
full_train_dataset = tokenized_datasets["train"]
full_eval_dataset = tokenized_datasets["test"]

# Define Model
model = BertForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=3,problem_type="multi_label_classification")

training_args = TrainingArguments("test_trainer",
                                  evaluation_strategy="epoch", # Evaluation is done every epoch
                                  num_train_epochs=N_EPOCHS, # Number of epochs
                                  per_device_train_batch_size=8, # Training Batch size per GPU
                                  per_device_eval_batch_size=32, # Evalution Batch size per GPU
                                  logging_dir="bert_results/logs", 
                                  logging_steps=10,
                                  )

trainer = Trainer(model=model,
                  args=training_args,
                  train_dataset=full_train_dataset,
                  eval_dataset=full_eval_dataset,
                  compute_metrics=compute_metrics,
                  )

output = trainer.train()

I tried being as clear as possible in this post, however I still am rookie. So if I missed to provide some important info please let me know!

Thanks again everyone!

sgugger · May 24, 2022, 12:04pm

From the look at your error message it looks like you have one label per example, so this is what we called a single label problem (one label per text) problem, not a multi label problem (where each text can have several labels at the same time).

So remove the problem_type argument and you should be fine

kasticrunch · May 25, 2022, 11:15pm

Awesome! Thank you for the feedback!

Topic		Replies	Views
Finetuning from multiclass to mutlilabel Intermediate	4	784	September 1, 2021
Multilabel text classification Trainer API Beginners	8	22537	August 2, 2023
Target size (torch.Size([8])) must be the same as input size (torch.Size([8, 2])) 🤗Transformers	5	5526	October 13, 2023
Multi-label token classification 🤗Transformers	34	7768	September 6, 2023
Logits and labels must have the same shape ((512, 6) vs (6, 1)) - MultiClass Classification with BERT Beginners	0	1446	September 3, 2021

Default `problem_type`

Related topics