Hi everyone, I’m new to Deep Learning. I came across the wonderful Transformers library last week and attempting to fine-tune the blenderbot model. I work at a charity that helps people with Cancer and wanted to see if I can use a chatbot on our cancer forum online to answer basic questions for our users.
Just to test if I can get the fine-tuning process to work I’m initially loading a simple dataset from a csv file with a numeric labels and corresponding strings using the load_dataset
function e.g.
1, Acute lymphoblastic leukaemia is a type of blood cancer it starts from white blood cells called lymphocytes in the bone marrow.
1, acute lymphoblastic leukaemia usually develops quickly over days or weeks
1, To understand how and why leukaemia affects you as it does, it helps to know how you make blood cells.
I followed the documentation for fine tuning the model from the huggingface course, which is nicely written and seemed fairly straightforward
from transformers import BlenderbotTokenizer, BlenderbotForConditionalGeneration
import torch
checkpoint = "facebook/blenderbot-400M-distill"
device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
tokenizer = BlenderbotTokenizer.from_pretrained(checkpoint)
model = BlenderbotForConditionalGeneration.from_pretrained(checkpoint).to(device)
I then created my dataset class and passed in the encodings from the tokenizer step train_encodings = tokenizer(train_texts, truncation=True, padding=True)
import torch
class CancerDataset(torch.utils.data.Dataset):
def __init__(self, encodings, labels):
self.encodings = encodings
self.labels = labels
def __getitem__(self, idx):
item = {key: torch.tensor(val[idx]) for key, val in self.encodings.items()}
item['labels'] = torch.tensor(self.labels[idx])
check_input_ids = item["input_ids"]
return item
def __len__(self):
return len(self.labels)
train_dataset = CancerDataset(train_encodings, train_labels)
val_dataset = CancerDataset(val_encodings, val_labels)
and then tried to train the model
from torch.utils.data import DataLoader
from transformers import AdamW
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
model = BlenderbotForConditionalGeneration.from_pretrained(checkpoint).to(device)
model.to(device)
model.train()
train_loader = DataLoader(train_dataset, batch_size=16, shuffle=True)
optim = AdamW(model.parameters(), lr=5e-5)
for epoch in range(3):
for batch in train_loader:
optim.zero_grad()
input_ids = batch['input_ids'].to(device)
attention_mask = batch['attention_mask'].to(device)
labels = batch['labels'].to(device)
outputs = model(input_ids, attention_mask=attention_mask, labels=labels)
loss = outputs[0]
loss.backward()
optim.step()
but I’m getting an error message saying too many indices for tensor of dimension 1
I suspected this might be an issue with the shape of the labels? (which has a shape of torch.Size([8]
)) rather than the input_ids
which has a shape of torch.Size([8, 88])
I looked online for reshaping a Pytorch tensor and came across the unsqueeze(0)
method to add a dimension but when I applied this to the label tensor it didn’t work
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-57-42b54a5f7d1d> in <module>
25 print(f" ---- 3b ---- labels shape {labels.shape}")
26
---> 27 outputs = model(input_ids, attention_mask=attention_mask, labels=labels)
28 loss = outputs[0]
29 loss.backward()
2 frames
/usr/local/lib/python3.7/dist-packages/transformers/models/blenderbot/modeling_blenderbot.py in shift_tokens_right(input_ids, pad_token_id, decoder_start_token_id)
67 """
68 shifted_input_ids = input_ids.new_zeros(input_ids.shape)
---> 69 shifted_input_ids[:, 1:] = input_ids[:, :-1].clone()
70 shifted_input_ids[:, 0] = decoder_start_token_id
71
IndexError: too many indices for tensor of dimension 1
Any advice on where I might fix this would be really appreciated, thank you.