I am fine tuning the Bert model on sentence ratings given on a scale of 1 to 9, but rather measuring its accuracy of classifying into the same score/category/bin as the judges, I just want BERT’s score on a continuous scale, like 1,1.1,1.2… to 9. I also need to figure out how to do this using CamemBERT as well. What are all changes to be made in BertForSequenceClassification and CamembertForSequenceClassification module and what are all the changes to be made in preprocessing (like encode_plus )?
Hi @sundaravel, you can check the source code for BertForSequenceClassification
here. It also has code for regression problem.
Specifically for regression your last layer will be of shape (hidden_size, 1) and use MSE loss instead of cross entropy
Hi @valhalla,
i just have created my account in this forum, and i can’t see the link you wrote in your commentary. It seems broken. Do you have another one link to get the source code?
Kind regards
Aah, yes. The dir structure is changed so the link is no more. You can now find the class here
Thank you very much!
I’m also trying to do regression using BERT and getting an error about Longs? Not sure what I’m doing wrong here.
from transformers import DistilBertForSequenceClassification, AdamW, BertConfig
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
config = BertConfig()
config.num_labels = 1
model = BertForSequenceClassification(config).from_pretrained('bert-base-cased')
model.to(device)
model.train()
optim = AdamW(model.parameters(), lr=5e-5)
for epoch in range(3):
for batch in dataloader:
optim.zero_grad()
input_ids = batch['input_ids'].to(device)
print(input_ids)
attention_mask = batch['attention_mask'].to(device)
print(attention_mask)
token_type_ids = batch['token_type_ids'].to(device)
print(token_type_ids)
labels = batch['labels'].to(device)
print(labels)
outputs = model(input_ids=input_ids, attention_mask=attention_mask,
token_type_ids=token_type_ids, labels=labels)
loss = outputs[0]
loss.backward()
optim.step()
model.eval()
Here’s the error (I printed out some data as well):
tensor([[ 101, 1198, 4841, ..., 0, 0, 0],
[ 101, 2960, 2254, ..., 0, 0, 0],
[ 101, 2866, 182, ..., 0, 0, 0],
...,
[ 101, 178, 112, ..., 0, 0, 0],
[ 101, 9294, 1128, ..., 0, 0, 0],
[ 101, 1268, 1185, ..., 0, 0, 0]], device='cuda:0')
tensor([[1, 1, 1, ..., 0, 0, 0],
[1, 1, 1, ..., 0, 0, 0],
[1, 1, 1, ..., 0, 0, 0],
...,
[1, 1, 1, ..., 0, 0, 0],
[1, 1, 1, ..., 0, 0, 0],
[1, 1, 1, ..., 0, 0, 0]], device='cuda:0')
tensor([[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
...,
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0]], device='cuda:0')
tensor([0.2135, 0.9005, 0.4206, 0.2755, 0.5373, 0.5537, 0.2492, 0.4841, 0.8241,
0.3545, 0.2813, 0.5674, 0.4098, 0.5857, 0.9476, 0.6094, 0.2778, 0.2974,
0.3362, 0.3490, 0.9035, 0.7904, 0.4856, 0.1117, 0.3851, 0.7932, 0.9066,
0.3630, 0.2709, 0.8578, 0.2255, 0.3292], device='cuda:0')
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-66-2f7282efa3c9> in <module>()
24 print(labels)
25 outputs = model(input_ids=input_ids, attention_mask=attention_mask,
---> 26 token_type_ids=token_type_ids, labels=labels)
27 loss = outputs[0]
28 loss.backward()
5 frames
/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py in nll_loss(input, target, weight, size_average, ignore_index, reduce, reduction)
2262 .format(input.size(0), target.size(0)))
2263 if dim == 2:
-> 2264 ret = torch._C._nn.nll_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
2265 elif dim == 4:
2266 ret = torch._C._nn.nll_loss2d(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
RuntimeError: Expected object of scalar type Long but got scalar type Float for argument #2 'target' in call to _thnn_nll_loss_forward
Any help is appreciated!
[I’m guessing here]
‘target’ is probably ‘labels’. What type are your labels initially?
You could try specifying their type explicitly, using something like
tb_labels = batch[2].to(device, dtype = torch.float)
Try this link python - RuntimeError: expected scalar type Long but found Float - Stack Overflow
The targets are floats. I’ll try this, thanks for the suggestion!
@rgwatwormhill Shoot, that didn’t work unfortunately.
r"""
labels (:obj:`torch.LongTensor` of shape :obj:`(batch_size,)`, `optional`):
Labels for computing the sequence classification/regression loss. Indices should be in :obj:`[0, ...,
config.num_labels - 1]`. If :obj:`config.num_labels == 1` a regression loss is computed (Mean-Square loss),
If :obj:`config.num_labels > 1` a classification loss is computed (Cross-Entropy).
"""
I have a couple of questions. The first is it seems odd that torch.LongTensor
would make sense for a regression problem. Not sure how that is compatible. Second question is would I need to add an additional layer to BERT to be able to do regression? Or remove the logits layer and replace it?
looks like the model is initialized incorrectly, for regression we need use num_labels=1
, and you can do it using two ways
config = BertConfig.from("...", num_labels=1)
model = BertForSequenceClassificatio.from_pretrained("...", config=config)
or
model = BertForSequenceClassificatio.from_pretrained("...", num_labels=1)
creating the model from config and the again using from_pretrained
will override the config params. So in your code the model still has num_labels=2
The first is it seems odd that
torch.LongTensor
would make sense for a regression problem
Yes, the doscstring should be corrected. But you can still pass float tensor for a regression problem.
@valhalla Thank you so much! Such a small detail haha. Got it working.
Well I got it “working” in that there are no errors now, but surprisingly to me, I am seeing the validation loss increase with every epoch. I’m not sure what I’m doing wrong.
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
print(f'Device used for training: {device}')
model = DistilBertForSequenceClassification.from_pretrained('distilbert-base-uncased', num_labels=1)
model.to(device)
model.train()
optim = AdamW(model.parameters(), lr=5e-5)
for epoch in range(10):
for batch in tqdm.notebook.tqdm(train_dataloader):
optim.zero_grad()
input_ids = batch['input_ids'].to(device)
attention_mask = batch['attention_mask'].to(device)
# token_type_ids = batch['token_type_ids'].to(device)
labels = batch['labels'].to(device)
outputs = model(input_ids=input_ids, attention_mask=attention_mask,
labels=labels)
loss = outputs[0]
loss.backward()
optim.step()
validation_loss = validate(test_dataset, model)
print(f"Validation loss in epoch {epoch}: {validation_loss}")
Here is my validate function:
def validate(test_dataset, model):
torch.cuda.empty_cache()
test_dataloader = torch.utils.data.DataLoader(test_dataset, batch_size=128)
total_loss = 0
with torch.no_grad():
for batch in tqdm.notebook.tqdm(test_dataloader):
input_ids = batch['input_ids'].to(device)
attention_mask = batch['attention_mask'].to(device)
# token_type_ids = batch['token_type_ids'].to(device)
labels = batch['labels'].to(device)
outputs = model(input_ids=input_ids, attention_mask=attention_mask,
labels=labels)
total_loss += outputs[0].item()
return(total_loss)
Here is some output showing the validation loss increasing:
`Device used for training: cuda`
100%
2871/2871 [1:49:25<00:00, 2.29s/it]
100%
1014/1014 [09:34<00:00, 1.76it/s]
Validation loss in epoch 0: 70428.39735794067
100%
2871/2871 [2:00:01<00:00, 2.51s/it]
100%
1014/1014 [09:47<00:00, 1.73it/s]
Validation loss in epoch 1: 69269.67090129852
100%
2871/2871 [2:01:22<00:00, 2.54s/it]
100%
1014/1014 [10:04<00:00, 1.68it/s]
Validation loss in epoch 2: 70188.32639312744
100%
2871/2871 [1:46:28<00:00, 2.23s/it]
100%
1014/1014 [09:35<00:00, 1.76it/s]
Validation loss in epoch 3: 72369.78367424011
100%
2871/2871 [1:47:28<00:00, 2.25s/it]
100%
1014/1014 [10:30<00:00, 1.61it/s]
Validation loss in epoch 4: 73700.79190158844
100%
2871/2871 [1:46:24<00:00, 2.22s/it]
100%
1014/1014 [09:24<00:00, 1.79it/s]
Validation loss in epoch 5: 74986.73181152344
Is the training loss also increasing?
@valhalla can you post a code snippet showing how to set the number of output labels?
@thecity2 can you show how you modified the loss function to something suitable for regression?
I have a quick question, do we need to scale output for example range between [0-10] for regression problems?
Hi @Mahsaseifikar ,
No. ( I don’t think so).
I am not an expert, and it is a year since I last looked at any Bert code, but I don’t remember ever having to think about what scale to use for the output in my regression problem.