Fine-tune BERT and Camembert for regression problem

sundaravel · July 16, 2020, 9:10pm

I am fine tuning the Bert model on sentence ratings given on a scale of 1 to 9, but rather measuring its accuracy of classifying into the same score/category/bin as the judges, I just want BERT’s score on a continuous scale, like 1,1.1,1.2… to 9. I also need to figure out how to do this using CamemBERT as well. What are all changes to be made in BertForSequenceClassification and CamembertForSequenceClassification module and what are all the changes to be made in preprocessing (like encode_plus )?

valhalla · July 17, 2020, 4:35am

Hi @sundaravel, you can check the source code for BertForSequenceClassification here. It also has code for regression problem.

Specifically for regression your last layer will be of shape (hidden_size, 1) and use MSE loss instead of cross entropy

iaranguiz · December 27, 2020, 11:40pm

Hi @valhalla,

i just have created my account in this forum, and i can’t see the link you wrote in your commentary. It seems broken. Do you have another one link to get the source code?

Kind regards

valhalla · December 29, 2020, 6:43am

Aah, yes. The dir structure is changed so the link is no more. You can now find the class here

iaranguiz · December 29, 2020, 2:40pm

Thank you very much!

thecity2 · January 22, 2021, 9:51pm

I’m also trying to do regression using BERT and getting an error about Longs? Not sure what I’m doing wrong here.

from transformers import DistilBertForSequenceClassification, AdamW, BertConfig

device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')

config = BertConfig()
config.num_labels = 1

model = BertForSequenceClassification(config).from_pretrained('bert-base-cased')
model.to(device)
model.train()

optim = AdamW(model.parameters(), lr=5e-5)

for epoch in range(3):
    for batch in dataloader:
        optim.zero_grad()
        input_ids = batch['input_ids'].to(device)
        print(input_ids)
        attention_mask = batch['attention_mask'].to(device)
        print(attention_mask)
        token_type_ids = batch['token_type_ids'].to(device)
        print(token_type_ids)
        labels = batch['labels'].to(device)
        print(labels)
        outputs = model(input_ids=input_ids, attention_mask=attention_mask, 
                        token_type_ids=token_type_ids, labels=labels)
        loss = outputs[0]
        loss.backward()
        optim.step()

model.eval()

Here’s the error (I printed out some data as well):

tensor([[ 101, 1198, 4841,  ...,    0,    0,    0],
        [ 101, 2960, 2254,  ...,    0,    0,    0],
        [ 101, 2866,  182,  ...,    0,    0,    0],
        ...,
        [ 101,  178,  112,  ...,    0,    0,    0],
        [ 101, 9294, 1128,  ...,    0,    0,    0],
        [ 101, 1268, 1185,  ...,    0,    0,    0]], device='cuda:0')
tensor([[1, 1, 1,  ..., 0, 0, 0],
        [1, 1, 1,  ..., 0, 0, 0],
        [1, 1, 1,  ..., 0, 0, 0],
        ...,
        [1, 1, 1,  ..., 0, 0, 0],
        [1, 1, 1,  ..., 0, 0, 0],
        [1, 1, 1,  ..., 0, 0, 0]], device='cuda:0')
tensor([[0, 0, 0,  ..., 0, 0, 0],
        [0, 0, 0,  ..., 0, 0, 0],
        [0, 0, 0,  ..., 0, 0, 0],
        ...,
        [0, 0, 0,  ..., 0, 0, 0],
        [0, 0, 0,  ..., 0, 0, 0],
        [0, 0, 0,  ..., 0, 0, 0]], device='cuda:0')
tensor([0.2135, 0.9005, 0.4206, 0.2755, 0.5373, 0.5537, 0.2492, 0.4841, 0.8241,
        0.3545, 0.2813, 0.5674, 0.4098, 0.5857, 0.9476, 0.6094, 0.2778, 0.2974,
        0.3362, 0.3490, 0.9035, 0.7904, 0.4856, 0.1117, 0.3851, 0.7932, 0.9066,
        0.3630, 0.2709, 0.8578, 0.2255, 0.3292], device='cuda:0')
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-66-2f7282efa3c9> in <module>()
     24         print(labels)
     25         outputs = model(input_ids=input_ids, attention_mask=attention_mask, 
---> 26                         token_type_ids=token_type_ids, labels=labels)
     27         loss = outputs[0]
     28         loss.backward()

5 frames
/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py in nll_loss(input, target, weight, size_average, ignore_index, reduce, reduction)
   2262                          .format(input.size(0), target.size(0)))
   2263     if dim == 2:
-> 2264         ret = torch._C._nn.nll_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
   2265     elif dim == 4:
   2266         ret = torch._C._nn.nll_loss2d(input, target, weight, _Reduction.get_enum(reduction), ignore_index)

RuntimeError: Expected object of scalar type Long but got scalar type Float for argument #2 'target' in call to _thnn_nll_loss_forward

Any help is appreciated!

rgwatwormhill · January 23, 2021, 6:15pm

[I’m guessing here]

‘target’ is probably ‘labels’. What type are your labels initially?

You could try specifying their type explicitly, using something like

tb_labels = batch[2].to(device, dtype = torch.float)

Try this link python - RuntimeError: expected scalar type Long but found Float - Stack Overflow

thecity2 · January 23, 2021, 10:44pm

The targets are floats. I’ll try this, thanks for the suggestion!

thecity2 · January 25, 2021, 8:41pm

@rgwatwormhill Shoot, that didn’t work unfortunately.

thecity2 · January 25, 2021, 9:18pm

r"""
    labels (:obj:`torch.LongTensor` of shape :obj:`(batch_size,)`, `optional`):
        Labels for computing the sequence classification/regression loss. Indices should be in :obj:`[0, ...,
        config.num_labels - 1]`. If :obj:`config.num_labels == 1` a regression loss is computed (Mean-Square loss),
        If :obj:`config.num_labels > 1` a classification loss is computed (Cross-Entropy).
    """

I have a couple of questions. The first is it seems odd that torch.LongTensor would make sense for a regression problem. Not sure how that is compatible. Second question is would I need to add an additional layer to BERT to be able to do regression? Or remove the logits layer and replace it?

valhalla · January 26, 2021, 6:52am

looks like the model is initialized incorrectly, for regression we need use num_labels=1, and you can do it using two ways

config = BertConfig.from("...", num_labels=1)
model = BertForSequenceClassificatio.from_pretrained("...", config=config)

or

model = BertForSequenceClassificatio.from_pretrained("...", num_labels=1)

creating the model from config and the again using from_pretrained will override the config params. So in your code the model still has num_labels=2

The first is it seems odd that torch.LongTensor would make sense for a regression problem

Yes, the doscstring should be corrected. But you can still pass float tensor for a regression problem.

thecity2 · January 26, 2021, 4:05pm

@valhalla Thank you so much! Such a small detail haha. Got it working.

thecity2 · January 28, 2021, 4:17pm

Well I got it “working” in that there are no errors now, but surprisingly to me, I am seeing the validation loss increase with every epoch. I’m not sure what I’m doing wrong.

device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
print(f'Device used for training: {device}')
model = DistilBertForSequenceClassification.from_pretrained('distilbert-base-uncased', num_labels=1)
model.to(device)
model.train()

optim = AdamW(model.parameters(), lr=5e-5)

for epoch in range(10):
    for batch in tqdm.notebook.tqdm(train_dataloader):
        optim.zero_grad()
        input_ids = batch['input_ids'].to(device)
        attention_mask = batch['attention_mask'].to(device)
        # token_type_ids = batch['token_type_ids'].to(device)
        labels = batch['labels'].to(device)
        outputs = model(input_ids=input_ids, attention_mask=attention_mask, 
                        labels=labels)
        loss = outputs[0]
        loss.backward()
        optim.step()
    validation_loss = validate(test_dataset, model)
    print(f"Validation loss in epoch {epoch}: {validation_loss}")

Here is my validate function:

def validate(test_dataset, model):
  torch.cuda.empty_cache()

  test_dataloader = torch.utils.data.DataLoader(test_dataset, batch_size=128)
  total_loss = 0
  with torch.no_grad():
    for batch in tqdm.notebook.tqdm(test_dataloader):
      input_ids = batch['input_ids'].to(device)
      attention_mask = batch['attention_mask'].to(device)
      # token_type_ids = batch['token_type_ids'].to(device)
      labels = batch['labels'].to(device)
      outputs = model(input_ids=input_ids, attention_mask=attention_mask, 
                      labels=labels)
      total_loss += outputs[0].item()
  return(total_loss)

Here is some output showing the validation loss increasing:

`Device used for training: cuda`
    100%

    2871/2871 [1:49:25<00:00, 2.29s/it]

    100%

    1014/1014 [09:34<00:00, 1.76it/s]

    Validation loss in epoch 0: 70428.39735794067

    100%

    2871/2871 [2:00:01<00:00, 2.51s/it]

    100%

    1014/1014 [09:47<00:00, 1.73it/s]

    Validation loss in epoch 1: 69269.67090129852

    100%

    2871/2871 [2:01:22<00:00, 2.54s/it]

    100%

    1014/1014 [10:04<00:00, 1.68it/s]

    Validation loss in epoch 2: 70188.32639312744

    100%

    2871/2871 [1:46:28<00:00, 2.23s/it]

    100%

    1014/1014 [09:35<00:00, 1.76it/s]

    Validation loss in epoch 3: 72369.78367424011

    100%

    2871/2871 [1:47:28<00:00, 2.25s/it]

    100%

    1014/1014 [10:30<00:00, 1.61it/s]

    Validation loss in epoch 4: 73700.79190158844

    100%

    2871/2871 [1:46:24<00:00, 2.22s/it]

    100%

    1014/1014 [09:24<00:00, 1.79it/s]

    Validation loss in epoch 5: 74986.73181152344

rgwatwormhill · January 31, 2021, 12:14pm

Is the training loss also increasing?

RylanSchaeffer · October 3, 2021, 9:38pm

@valhalla can you post a code snippet showing how to set the number of output labels?

RylanSchaeffer · October 3, 2021, 10:55pm

@thecity2 can you show how you modified the loss function to something suitable for regression?

Mahsaseifikar · November 24, 2021, 5:11pm

I have a quick question, do we need to scale output for example range between [0-10] for regression problems?

rgwatwormhill · December 4, 2021, 12:24pm

Hi @Mahsaseifikar ,

No. ( I don’t think so).

I am not an expert, and it is a year since I last looked at any Bert code, but I don’t remember ever having to think about what scale to use for the output in my regression problem.

Topic		Replies	Views
BERT (CamemBERT) for Sequence Classification maps any sequence to the exact same encoding Models	0	194	July 7, 2023
Fine Tune BERT Models Beginners	5	15336	June 25, 2021
Finetuning Bert to adapt to the newly added class 🤗Transformers	0	67	June 22, 2024
Questions about my first code on fine-tuning BERT model for text-classification Beginners	0	1431	April 26, 2022
Sentence Embeddings From Fine-Tuned BERTForSequenceClassification 🤗Transformers	1	1570	September 29, 2021

Fine-tune BERT and Camembert for regression problem

Related topics