Hi,
Regarding the normalization layers that have been modified from the original “Attention is All You Need” architecture (which used Post-Normalization layers) which models/configs have implemented Pre-Normalization layers instead of Post-Normalization layers? Is this somehow editable?
Thanks in advance