Question about layer norm in T5

I notice that there is no bias and no subtraction of mean in layer norm.

I’m confused about the meaning of computing variance without subtraction of mean.

Normally, we compute variance, for example:


But it’s different here. Why is that?

I found a paper explained about this in case anyone with the same question.