How to compute per-token loss when doing language modeling?

Thanks for your reply @chrisociepa. Yes, I was able to work it with reduction='none'