Can anyone help me to understand the meaning of loss for Informer/Autoformer/Timeseries Transformer?
I have checked the code and kind of know that the prediction head is student T transformation. And it’s using negative log likelihood (log probability) to be the loss. But I still not quite understand the math behind the code.
When I trained the informer using my own dataset, I observed loss decreased from positive to negative. Is this an expected performance?