Negative "cross entropy" loss function

tdobrxl · December 15, 2022, 3:34pm

I have overridden the compute_loss function as follows:

class CustomTrainer(Seq2SeqTrainer):
def compute_loss(self, model, inputs, return_outputs=False):
logits, g_output = model(**inputs)
labels = inputs.get(“labels”)
graph_labels = inputs.get(“subgraph”)
loss_fn = torch.nn.CrossEntropyLoss() # by default, index with value -100 will be ignored
extra_loss_fct = torch.nn.BCEWithLogitsLoss()
loss = loss_fn(logits.view(-1, logits.shape[-1]), labels.view(-1)) +
ALPHA * extra_loss_fct(g_output, graph_labels.float())
return (loss, logits) if return_outputs else loss

Also, I’ve overridden the forward function as follows:

class Text2MultiTargetsV2(PreTrainedModel):
def init(self, config):
super().init(config)
self.transformer = AutoModelForSeq2SeqLM.from_pretrained(model_checkpoint)
self.mp_layer = MeanPooling()
self.subgraph = torch.nn.Linear(D_EMBEDDING, D_GRAPH)

def forward(self, input_ids, attention_mask, labels=None, subgraph=None):
    encoder_output = self.transformer.get_encoder()(
        input_ids=input_ids,
        attention_mask=attention_mask
    )
    # compute extra output for the subgraph of shape [B, N+E]
    x_mean = self.mp_layer(encoder_output[0], attention_mask)
    subg = self.subgraph(x_mean)  # sigmoid to be included in the loss
    nlp_out = self.transformer(
        input_ids=input_ids,
        attention_mask=attention_mask,
        labels=labels).logits
    return nlp_out, subg

def generate(self, model_inputs, **kwargs):
    return self.transformer.generate(
        model_inputs,
        **kwargs)

Then I used the custom trainer to train the model above.

trainer = CustomTrainer(
model,
args,
train_dataset=tokenized_datasets[‘train’],
eval_dataset=tokenized_datasets[‘valid’],
data_collator=data_collator,
tokenizer=tokenizer,
compute_metrics=compute_metrics
)

The training went well and I was able to predict correct result.
The only problem is that whenever a training epoch finishes, the evaluation loss is a negative number, which is weird.
I put a debug in the compute_loss function, and found that the function is not called during the evaluation.

Could you tell me why the overridden function not get called?

Many thanks

Topic		Replies	Views
Supervised Fine-tuning Trainer - Loss function calculation Beginners	0	3329	September 6, 2023
Trainer code for token-wise prediction model Intermediate	0	435	June 6, 2022
Custom Training Loss Function for Seq2Seq BART Beginners	1	1724	July 21, 2023
Transformers replacing loss function 🤗Transformers	0	3368	March 26, 2022
Help with custom loss function Beginners	0	2051	February 21, 2022

Negative "cross entropy" loss function

Related topics