Reproduce attention is all you need

Is there any pretrained model implemented by HF which have the exactly the same structure as the vanilla transformer? so that I can just set the config file for that model and reproduce the result in paper ‘attention is all you need’.

Any help would be appreciated!