Hi everyone, how is the tokens per second calculated during training? And how different is it compared to the inference?