I read online tutorial about implementation fine-tuning from the website:
I do not know why for the decoder_input_ids and labels, the author removed last and first token ID, respectively.
See the following codes
for _, data in enumerate(loader, 0):
y = data["target_ids"].to(device, dtype=torch.long)
y_ids = y[:, :-1].contiguous()
lm_labels = y[:, 1:].clone().detach()
lm_labels[y[:, 1:] == tokenizer.pad_token_id] = -100
ids = data["source_ids"].to(device, dtype=torch.long)
mask = data["source_mask"].to(device, dtype=torch.long)
outputs = model(
input_ids=ids,
attention_mask=mask,
decoder_input_ids=y_ids,
labels=lm_labels,
)
I read the original paper and hugging-face documents but I still do not understand. Could anyone tell me?