Hello,
Before I explain, please understand that I am not from an English-speaking country and my English may not be the best.
I’m preparing an experiment to replicate(reproduce) about “attention is all you need” paper.
Since “attention is all you need” is the original paper on transformer, I want to implement pure transformer with huggingface.
Implementing through huggingface is convenient because it provides a generate() function with multiple arguments.
So how can I implement the same encoder-decoder transformer model as the structure of ‘attention is all you need’?
Thanks.