Hi there,
I’m experiencing an unexpected behaviour working on a summarization model:
As I increase my training samples, as expected, my validation loss decreases BUT the ROUGE metrics do not improve (do not increase), so the best model based on validation loss is not the best model based on ROUGE, they are not even close: a model with a quite larger validation loss is the one with the best ROUGE scores.
What is your objective function for validation loss? I think the answer to your question depends on several factors, the biggest of which are: the kind of rouge you are using (R1 vs. R2 vs. RL) and the score range (high vs. low).
I am using bigbird-pegasus model pretrained with bigpatent. So I think my loss is the probability of the test sequences.
I understand that having improved probability of the test sequences should lead to having improved ROUGE scores when generating. Isn’t it?