Calculate tokens per second while fine-tuning llm?

Hi everyone, how is the tokens per second calculated during training? And how different is it compared to the inference?