Hi all,
Is the transformer model and tokenizer used in the paper ‘Attention is all you need’ available in HF?
I want to reproduce the result in the paper.
( ‘We use Transformer (Vaswani et al., 2017) as the basic model structure’)
They got 28.4 bleu score using the basic transformer model on en-de task.
Any help will be appreciated!