Way to train a basic Transformer

Is there any way to train a basic Transformer (the transformer architecture presented in the original Attention Is All You Need paper). I’m trying to set up a translation baseline on my custom dataset (previously trained models do not exist for the language). If anyone can point out a ready-made implementation of just the basic Transformer it would be very helpful.


1 Like

I think OpenNMT is the better framework to do something like that. It allows to train a model that you describe easily. transformers is more directed towards specialized, specific transformer architectures.

1 Like

Thanks! I used fairseq a few times. I think OpenNMT is kinda similar, they seem more black box like not full control on the entire pipeline. That’s why I wanted to know if such a thing is possible with transformers because this is native Python and not command-line tools.

Have you seen this: https://pytorch.org/tutorials/beginner/transformer_tutorial.html ?


And this is TF2-Keras equivalent to @abhishek’s recommendation :smiley:

1 Like

Thank you! I implemented mine from scratch in Pytorch and Lightning with the help of bentrevett’s notebook and source code of BART in the transformers library.

Thank you for the resources.

1 Like