Cross Entropy Loss and loss of HuggingFace T5ForConditionalGeneration does not matches

BramVanroy · August 30, 2021, 2:51pm

See my comments below. Only where you calculate loss manually you replace 0 with -100. This replacement does not happen in the built-in T5ForConditionalGeneration method so you have to do the replacement beforehand.

# Here you get loss based on "target_text_input_ids" as-is (no ignored index)
loss, outputs = self(
    source_text_input_ids, source_text_attention_mask, target_text_input_ids
)
loss_mine = None

output = self.model(
    input_ids=source_text_input_ids,
    attention_mask=source_text_attention_mask,
    labels=target_text_input_ids,
)

# Here you first set the padding IDs to -100 so that CE will ignore them...
labels = batch["target_text_input_ids"].clone()
labels[labels == 0] = -100
if target_text_input_ids is not None:
    loss_fct = CrossEntropyLoss(ignore_index=-100)
    # ... and THEN you calculate loss
    loss_mine = loss_fct(output.logits.view(-1, outputs.size(-1)), labels.view(-1))
    print(f"loss_huggingface: {loss.item()}, loss_mine : {loss_mine.item()}")

Topic		Replies	Views
How to train TFT5ForConditionalGeneration model? 🤗Transformers	5	3329	November 21, 2020
Traing loss decreases but dev accuracy gives zero Beginners	0	364	January 10, 2023
T5 Model Generate and Model Outputs Vastly Different Beginners	1	815	September 11, 2022
What is the loss function of a pre-trained T5 model? Models	1	1198	June 19, 2023
T5 variants return Training Loss 0 and Validation loss nan while fine tuning 🤗Transformers	8	5430	November 10, 2024

Cross Entropy Loss and loss of HuggingFace T5ForConditionalGeneration does not matches

Related topics