Dropout before layer normalization

At first stage of BartDecoder, we compute

  1. compute token embedding
  2. add positional embedding
  3. layer normalization
  4. dropout (optional)

x = self.embed_tokens(input_ids)
x += positions
x = self.layernorm(x)
x = dropout(x, p, self.training)

I am thinking of moving dropout right before adding positional embedding for making token embedding noisy

x = self.embed_tokens(input_ids)
x = dropout(x, p, self.training)
x += positions
x = self.layernorm(x)

Is there any belief that dropout needs to be placed after layer normalization?