Meaning of loss for timeseries transformers/Informer/Autoformer

Can anyone help me to understand the meaning of loss for Informer/Autoformer/Timeseries Transformer?

I have checked the code and kind of know that the prediction head is student T transformation. And it’s using negative log likelihood (log probability) to be the loss. But I still not quite understand the math behind the code.

When I trained the informer using my own dataset, I observed loss decreased from positive to negative. Is this an expected performance?