Is there any pretrained model implemented by HF which have the exactly the same structure as the vanilla transformer? so that I can just set the config file for that model and reproduce the result in paper ‘attention is all you need’.
Any help would be appreciated!